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(54) Device using eigenspace method for recognizing hand shape and position 



(57) An object of the present invention is to provide 
a device and a method for recognizing hand shape and 
position even if a hand image to be provided for recog- 
nition is rather complicated in shape, and a recording 
medium having a program for carrying out the method 
recorded thereon. 

A hand image normalization part 1 1 deletes a wrist 
region respectively from a plurality of images varied in 
hand shape and position before subjecting the images 
to normalization in hand orientation and size to gener- 
ate hand shape images. An eigenspace calculation part 
13 calculates an eigenvalue and an eigenvector respec- 
tively from the hand shape images under an analysis 
based on an eigenspace method. An eigenspace pro- 
jection part 15 calculates eigenspace projection coordi- 
nates by projecting the hand shape images onto an 
eigenspace having the eigenvectors as a basis. A hand 
image normalization part 21 deletes a wrist region from 
an input hand image, and generates an input hand 
shape image by normalizing the input hand image to be 
equivalent to the hand shape images. An eigenspace 
projection part 22 calculates eigenspace projection 
coordinates for the input hand shape image by project- 
ing the same onto the eigenspace having the eigenvec- 
tors as the basis. A hand shape image selection part 23 
compares the eigenspace projection coordinates calcu- 
lated for the input hand shape image with each of the 
eigenspace projection coordinates calculated for the 



hand shape images, and then determines which of the 
hand shape images is closest to the input hand shape 
image. A shape/position output part 24 outputs shape 
information and position information on the determined 
hand shape image. 
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Description , 

BACKGROUND OF THE INVENTION 
Field of the Invention 

[0001] The present invention relates to devices and 
methods for recognizing hand shape and position, and 
recording media each having a program for carrying out 
the methods recorded thereon, and more specifically to 
a device and a method for recognizing hand shape and 
position, without the help of an exemplary cable-con- 
nected data glove, in an applicable manner to man- 
machine interfaces and sign language recognition 
devices, for example, and to a recording medium having 
a program for carrying out the method recorded ther- 
eon. 

Description of the Background Art 

[0002] For a new human interface technique, cur- 
rently, research and development of a device which rec- 
ognizes human hand shape and grasps information 
conveyed thereby is actively conducted. Also, research 
for recognizfng hand shape and position observed In 
sign language is also active to support communications 
between the hearing impaired and the able-bodied. 
[0003] A general method for capturing human hand 
shape uses a sensor such as data glove to measure 
hand position and finger joint angles, and an exemplary 
well-known method is found in the document published 
by The Institute of Electrical Engineers of Japan, Instru- 
mentation and Measurement (pp. 49 to 56, 1 994) (here- 
inafter, referred to as first document). In the first 
document, the glove is provided with optical fibers along 
every finger, and finger joint angles are estimated by a 
change in light intensity. 

[0004] A method for recognizing hand shape with- 
out the glove-type sensor as in the first document but 
with a camera is found in the document titled "Gesture 
Recognition Using Colored Gloves" by Watanabe, et al., 
(Publication of The Electronic Information Communica- 
tions Society, Vol. J80-D-2, No. 10, pp. 2713 to 2722) 
(hereinafter, referred to as second document). In the 
second document, images are captured through a mul- 
ticolored glove (marker) for hand shape recognition. 
[0005] An exemplary method for recognizing hand 
shape and position without such marker but with only a 
camera is disclosed in the Japanese Patent Laying- 
Open No. 8-263629 (96-263629) titled "Object 
Shape/Position Detector" (hereinafter, referred to as 
third document). In the third document, hand shape rec- 
ognition and hand position estimation are conducted 
through images captured by a camera placed in front of 
a hand. Herein, the method uses at least three cameras 
to photograph the hand, and the hand is taken in as a 
plane so as to determine to which camera the hand is 
facing. 



[0006] Another method for recognizing hand shape 
from images captured by a front-facing camera is found 
in the document titled "Real Time Vision-Based Hand 
Gesture Estimation For Human-Computer interfaces" 

5 by Ishibuchi, et al., (Publication of The Electronic Infor- 
mation Communications Society, Vol. J79-D-2, No. 7, 
pp. 121 8 to 1229) (hereinafter, referred to as fourth doc- 
ument), in the fourth document, from hand images cap- 
tured by a plurality of cameras, a direction from wrist to 

7 0 middle finger (hereinafter, referred to as palm principal 
axis) is determined. And the position of each fingertip is 
also determined to count the number of extended fin- 
gers. 

[0007] In recent years, to recognize object position 
15 and type of face or car, for example, an image recogni- 
tion method, which is the combination of a dummy 
image method and an eigenspace method, has been in 
the spotlight. The dummy image method uses only pre- 
viously-captured 2D dummy images of a 3D object to 
20 recognize the position and type thereof. The 
eigenspace method is the one conventionally applied, 
and uses an eigenspace structured by eigenvectors in a 
covariance- matrix (or auto correlation matrix) obtained 
through an operation performed on a matrix being 
25 image data, in the eigenspace method, it is well-known 
to apply principal component analysis or KL expansion 
to images. 

[0008] A technique for applying the principal com- 
ponent analysis to images is briefly described next 
30 below. 

[0009] The principal component analysis is a statis- 
tical technique utilizing an eigenspace. This is popular 
as a technique in multivariate analysis, and is so carried 
out that featured points on a multidimensional space are 

35 represented on a space where the number of dimen- 
sions is reduced. This is done to make the featured 
points easier to see and handle. Fundamentally, fea- 
tured points on a multidimensional space are linearly 
projected onto a less-dimensional orthogonal subspace 

40 where a distribution level is high. 

[0010] In a case where the principal component 
analysis technique is applied to images, first, an image 
unit including p-piece images is expressed by 

^5 fU v U 2) U 3 Up), 

where U denotes a column vector obtained by subject- 
ing images ofnxm pixels to raster scanning. 
[0011] Second, a component of average image c 
so obtained from a plurality of images is deducted from the 
respective cofumn vectors in the fmage unit. Assuming 
that an nm x p matrix structured by such column vec- 
tors is A, the matrix A is expressed by 

55 A = fU t -c, U 2 -c, .... Up-C], 

and accordingly a covariance matrix O is calculated by 
the following equation (1). Note that, a matrix A 7 indi- 
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cates a matrix transposed from the matrix A. 

Q = AA T (1) 

[0012] Thereafter, a characteristic equation (2) is 
solved by using the covariance matrix Q. 

X,-Qe, (2) 

[001 3] Herein, assuming that the number of dimen- 
sions of a to-be-structured subspace is k, the subspace 
can be structured by using eigenvectors which corre- 
spond to /c-piece large eigenvalues 

e i , e 2 , .... e k (X 1 al 2 i ... s X k s ... s X p ) 

as basis. 

[0014] In this manner, according to the following 
equation (3), by linearly projecting a certain image x 
onto the subspace represented by the eigenvectors, the 
image in the n x m dimension can be represented by a 
/c-th dimension featured vector y in a less-dimensional 
space. 

y = [e ll e 2 , .... e k ] T x (3) 

[001 5] An exemplary method for detecting and rec- 
ognizing any multifeatured entity such as human face 
under principal component analysis or KL expansion is 
found in the Japanese Patent Laying-Open No. 8- 
339445 (96-339445) titled "Detection, Recognition and 
Coding of Complex Objects Using Probabilistic 
Eigenspace Analysis" (hereinafter, referred to as fifth 
document). The feature of the fifth document lies in a 
respect that the conventionally-known principal compo- 
nent analysis and KL expansion are applied to a multi- 
featured entity such as face. The fifth document 
exemplarily applies such techniques to recognize hand 
shape, and the method in the fifth document is 
described next below. 

[0016] First, a plurality of hand images captured 
through hand movement or gesture are photographed 
with a black background. Second, the two-dimensional 
contour of the hand is extracted by using Cann/s edge 
operator. Thereafter, the obtained edge images are sub- 
jected to the KL expansion to calculate a subspace. If 
an edge map in binary is used herein, however, the 
images may show little correlation with one another, and 
thus the number of dimensions k of the subspace needs 
to be increased to a considerable extent. By taking this 
into consideration, the example described in the fifth 
document proposes to calculate the subspace after 
blurring the edge images, on the edge map in binary, 
through distribution processing. In this manner, the 
number of dimensions of the subspace can be sup- 
pressed. Further, in the fifth document, the images are 
entirely searched on a predetermined size basis so as 
to find the hand location from an input image, and then 



recognition is carried out. 

[0017] However, for hand shape recognition, wear- 
ing such data glove as in the first document may restrict 
hand movement due to codes connected thereto, and a 
5 user may feel uncomfortable about wearing the glove 
being tight. 

[0018] In a case where hand shape recognition is 
conducted by using a camera presumably together with 
a marker such as glove, as in the second document, the 
10 hand shape recognition cannot be achieved without the 
glove, and the problem of uncomfortableness is still left 
unsolved. 

[0019] Further, in a case where hand shape and 
position recognition is conducted without the glove or 

15 marker but with a plurality of cameras, as in the third 
document, the hand is taken in as a plane so as to 
determine to which camera the hand is facing. In reality, 
however, the hand can be in a variety of shapes and 
some shapes cannot be judged as being closely analo- 

20 gous to the plane. Accordingly, the method can be 
applied to recognize simple shapes formed only by 
extending or bending fingers, for example, but not to 
rather complicated shapes (e.g., a circle formed by a 
thumb and an index finger). 

25 [0020] Stiil further, in the method based on the con- 
ventional eigenspace analysis as described in the fourth 
document, it is not specified how to capture normalized 
images only of hand. The importance for the method 
based on the eigenspace analysis lies in how an image 

30 region of an object is cut out before normalization. 
When being a simple unit, the object only needs to be 
subjected to normalization with respect only to size and 
contrast. On the other hand, when being complicated 
such as hand or face, the object needs to be subjected 

35 to cutting processing before normalization. 

[0021] For example, when the method is applied to 
face recognition, popularly, eye and nose regions are 
first moved to a predetermined position, and then chin 
and hair regions are deleted. When the method is 

40 applied to hand recognition, a wrist region is first 
deleted in some manner, and then the hand is moved to 
a predetermined position for normalization. Without 
such processing, the method based on the eigenspace 
analysis may result in a low recognition rate for hand 

45 shape and position recognition. 

[0022] Still further, in a case where the eigenspace 
analysis is applied to a human hand image as in the fifth 
document, it is required to extract the contour of the 
hand and blur an edge image. In this manner, it is 

so impossible to distinguish between an image of one fin- 
ger and an image of two fingers abutting to each other, 
therefore the method cannot be applied to rather com- 
plicated shapes. 



[0023] Therefore, an object of the present invention 
is to provide a device and a method for recognizing 
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hand shape and position even if a hand image to be pro- 
vided for recognition is rather complicated in shape, and 
a recording medium having a program for carrying out 
the method recorded thereon. This is implemented by, 
under a method based on the eigenspace analysis, nor- 5 
malizing a plurality of prestored hand images varied in 
hand shape and position and the to-be-provided hand 
image after a wrist region is respectively deleted there- 
from. 

[0024] The present invention has the following fea- 10 
tures to attain the object above. 

[0025 J A first aspect of the present invention is 
directed to a device for recognizing hand shape and 
position of a hand image obtained by optical read 
means (hereinafter, referred to as input hand image), is 
the device comprising: 

a first hand image normalization part for receiving a 
plurality of hand images varied in hand shape and 
position, " and after a wrist region is respectively 20 
deleted therefrom, subjecting the hand images to 
normalization in a predetermined manner (in hand 
orientation, image size, image contrast) to generate 
hand shape images; 

a hand shape image information storage part for 25 
storing the hand shape images together with shape 
information and position information about each of 
the hand shape images; 

an eigenspace calculation part for calculating an 
eigenvalue and an eigenvector from each of the 30 
hand shape images under analysis based on an 
eigenspace method; 

an eigenvector storage part for storing the eigen- 
vectors; 

a first eigenspace projection part for calculating 35 
eigenspace projection coordinates respectively for 
the hand shape images by projecting the hand 
shape images onto an eigenspace having the 
eigenvectors as a basis, and storing the 
eigenspace projection coordinates into the hand 40 
shape image information storage part; 
a second hand image normalization part for receiv- 
ing the input hand image, and after a wrist region is 
deleted therefrom, normalizing the input hand 
image to generate an input hand shape image 45 
being equivalent to the hand shape images; 
a second eigenspace projection part for calculating 
eigenspace projection coordinates for the input 
hand shape image by projecting the input hand 
shape image onto the eigenspace having the so 
eigenvectors as the basis; 

a hand shape image selection part for comparing 
the eigenspace projection coordinates calculated 
by the second eigenspace projection part with the 
eigenspace projection coordinates stored in the 55 
hand shape image information storage part, and 
determining which of the hand shape images is 
closest to the input hand shape image; and 



a shape/position output part for obtaining, for out- 
put, the shape information and the position informa- 
tion on the closest hand shape image from the hand 
shape image information storage part. 

[0026] As described above, in the first aspect, a plu- 
rality of hand images varied in hand shape and position 
and an input hand image for recognition are all sub- 
jected to wrist region deletion before normalization. 
Therefore, the hand images can be normalized with 
higher accuracy compared to a case where the hand 
images are simply subjected to normalization in size 
and contrast. Accordingly, under a method based on the 
eigenspace, the hand shape and position can be recog- 
nized with accuracy of a sufficient degree. 
[0027] Further, by using the method based on the 
eigenspace, geometric characteristics such as the 
number of extended fingers can be recognized, 
whereby rather complicated hand shapes having little 
geometric characteristics can be correctly recognized. 
[0028] A second aspect of the present invention is 
directed to a device for recognizing hand shape and 
position of a hand image obtained by optical read 
means (hereinafter, referred to as input hand image), 
the device comprising: 

a first hand image normalization part for receiving a 
plurality of hand images varied in hand shape and 
position, and after a wrist region is respectively 
deleted therefrom, subjecting the hand images to 
normalization in a predetermined manner (in hand 
orientation, image size, image contrast) to generate 
hand shape images; 

a hand shape image information storage part for 
storing the hand shape images together with shape 
information and position information about each of 
the hand shape images; 

an eigenspace calculation part for calculating an 
eigenvalue and an eigenvector from each of the 
hand shape images under analysis based on an 
eigenspace method; 

an eigenvector storage part for storing the eigen- 
vectors; 

a first eigenspace projection part for calculating 
eigenspace projection coordinates respectively for 
the hand shape images by projecting the hand 
shape images onto an eigenspace having the 
eigenvectors as a basis, and storing the 
eigenspace projection coordinates into the hand 
shape image information storage part; 
a cluster evaluation part for classifying, into clus- 
ters, the eigenspace projection coordinates under 
cluster evaluation, determining which of the hand 
shape images belongs to which cluster for storage 
into the hand shape image information storage 
part, and obtaining statistical information about 
each cluster; 

a cluster information storage part for storing each of 
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the statistical information together with the cluster 
corresponding thereto; 

a second hand image normalization part for receiv- 
ing the input hand image, and after a wrist region is 
deleted therefrom, normalizing the input hand 
image to generate an input hand shape image 
being equivalent to the hand shape images; 
a second eigenspace projection part for calculating 
eigenspace projection coordinates for the input 
hand shape image by projecting the input hand 
shape image onto the eigenspace having the 
eigenvectors as the basis; 

a maximum likelihood cluster judgement part for 
comparing the eigenspace projection coordinates 
calculated by the second eigenspace projection 
part with each of coordinates included in the statis- 
tical information stored in the cluster information 
storage part, and determining which cluster is the 
closest; 

an image comparison part for comparing the hand 
shape images included in the closest cluster with 
the input hand shape image, and determining 
which of the hand shape images is analogous most 
closely to the input hand shape image; and 
a shape/position output part for obtaining, for out- 
put, the shape information and the position informa- 
tion on the most analogous hand shape image from 
the hand shape image Information storage part. 

[0029] As described above, in the second aspect, 
the hand shape images stored in the hand shape image 
information storage part are classified into clusters, 
under cluster evaluation in the eigenspace. Thereafter, 
it is decided to which cluster an input hand image 
belongs, and then is decided which hand shape image 
in the cluster is the closest to the input hand image. In 
this manner, the frequency of comparison for matching 
can be reduced and the processing speed can be 
improved. Further, it is possible to accurately define 
each image by hand shape and position even if the 
images are analogous in hand position from a certain 
direction but different in hand shape. 
[0030] According to a third aspect, in the second 
aspect, the image comparison part includes: 

an identical shape classification part for classifying, 
according to hand shape, the hand shape images 
included in the cluster determined by the maximum 
likelihood cluster judgement part into groups before 
comparing the hand shape images with the input 
hand shape image generated by the second hand 
image normalization part; 

a shape group statistic calculation part for calculat- 
ing a statistic representing the groups; and 
a maximum likelihood shape judgement part for cal- 
culating a distance between the input hand shape 
image and the statistic, and outputting a hand 
shape included in the closest group. 



[O031] As described above, in the third aspect, in a 
case where the hand shape images are enough to be 
defined only by hand shape, the hand shape can be rec- 
ognized more accurate than a case where the hand 

5 shape and the hand position are both recognized. 

[0032] According to a fourth aspect, in the second 
aspect, the cluster evaluation part obtains the hand 
shape images and the shape information for each clus- 
ter from the hand shape image information storage part, 

io calculates a partial region respectively for the hand 
shape images for discrimination, and stores the partial 
regions into the cluster information storage part; and 

the image comparison part compares the hand 
75 shape images in the cluster determined by the max- 
imum likelihood cluster judgement part with the 
input hand shape image generated by the second 
hand image normalization part only in the partial 
region corresponding to the cluster. 

20 

[0033] As described above, in the fourth aspect, a 
partial region is predetermined, and the comparison for 
matching between the hand shape images and the input 
hand shape image is done for the parts within the partial 

25 region. In this manner, the comparison for matching can 
be less frequent than the second aspect, and accord- 
ingly still higher-speed processing can be achieved with 
a higher degree of accuracy even if the images are anal- 
ogous in hand position from a certain direction but dif- 

30 ferent in hand shape. 

[0034] According to a fifth aspect, in the second 
aspect, 

when the input hand image is plurally provided by 
35 photographing a hand from several directions, 

the second hand image normalization part gener- 
ates the input hand shape image for each of the 
input hand images, 

the second eigenspace projection part calculates 
40 the eigenspace projection coordinates in the 
eigenspace respectively for the input hand shape 
images generated by the second hand image nor- 
malization part, 

the maximum likelihood cluster judgement part 
45 compares each of the eigenspace projection coor- 
dinates calculated by the second eigenspace pro- 
jection part with the statistical information, and 
determines which cluster is the closest, and 
the image comparison part merges the closest 
so clusters determined by the maximum likelihood 
cluster judgement part, and estimates hand shape 
and position consistent to the shape information 
and the position information about the hand shape 
images in each of the clusters. 

55 

[0035] As described above, in the fifth aspect, input 
hand images obtained from a plurality of cameras can 
be defined by hand shape and position by merging clus- 
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ters, based on the closeness in distance thereamong, 
determined for each of the input hand images. In this 
manner, even a hand image which has been difficult to 
recognize from one direction (e.g., a hand image from 
the side) can be defined by hand shape and position 
with accuracy. 

[0036] A sixth aspect of the present invention is 
directed to a device for recognizing a meaning of suc- 
cessive hand images (hereinafter, referred to as 66) 
obtained by optical read means, the device comprising; 

a first hand image normalization part for receiving a 
plurality of hand images varied in hand shape and 
position, and after a wrist region is respectively 
deleted therefrom, subjecting the hand images to 
normalization in a predetermined manner (in hand 
orientation, image size, image contrast) to generate 
hand shape images; 

a hand shape image information storage part for 
storing the hand shape images together with shape 
information and position information about each of 
the hand shape images; 

an eigenspace calculation part for calculating an 
eigenvalue and an eigenvector from each of the 
hand shape images under analysis based on an 
eigenspace method; 

an eigenvector storage part for storing the eigen- 
vectors; 

a first eigenspace projection part for calculating 
eigenspace projection coordinates respectively for 
the hand shape images by projecting the hand 
shape images onto an eigenspace having the 
eigenvectors as a basis, and storing the 
eigenspace projection coordinates into the hand 
shape image information storage part; 
a cluster evaluation part for classifying, into clus- 
ters, the eigenspace projection coordinates under 
cluster evaluation, determining which of the hand 
shape images belongs to which cluster for storage 
into the hand shape image information storage 
part, and obtaining statistical information about 
each cluster; 

a cluster information storage part for storing each of 
the statistical information together with the cluster 
corresponding thereto; 

a hand region detection part for receiving the hand 
movement image, and detecting a hand region 
respectively from the hand images structuring the 
hand movement image; 

a hand movement segmentation part for determin- 
ing how the hand is moved in each of the detected 
hand regions, and finding any change point in hand 
movement according thereto; 
a hand image cutting part for cutting an image cor- 
responding to the detected hand region respec- 
tively from the images including the change points; 
a second hand image normalization part for respec- 
tively normalizing one or more hand images (here- 



inafter, referred to as hand image series) cut from 
the hand movement image by the hand image cut- 
ting part, after a wrist region is each deleted there- 
from, and generating input hand shape images 

5 being equivalent to the hand shape images; 

a second eigenspace projection part for calculating 
eigenspace projection coordinates for each of the 
input hand shape images by projecting the input 
hand shape images onto the eigenspace having the 

10 eigenvectors as the basis; 

a maximum likelihood cluster judgement part for 
comparing each of the eigenspace projection coor- 
dinates calculated by the second eigenspace pro- 
jection part with the statistical information stored in 

15 the cluster information storage part, determining 
which cluster is the closest to each of the 
eigenspace projection coordinates, and outputting 
a symbol each specifying the clusters; 
a series registration part for registering, in a series 

20 identification dictionary part, the symbols (hereinaf- 
ter, referred to symbol series) corresponding to the 
hand image series outputted by the maximum like- 
lihood cluster judgement part together with a mean- 
ing of the hand movement image; 

25 the series identification dictionary part for storing 
the meaning of the hand movement image and the 
symbol series corresponding thereto; and 
an identification operation part for obtaining, for out- 
put, one of the meanings corresponding to the sym- 

30 bol series outputted by the maximum likelihood 
cluster judgement part from the series identification 
dictionary part. 

[0037] As described above, in the sixth aspect, the 
35 meaning of the hand movement successively made to 
carry a meaning in gesture or sign language is previ- 
ously stored together with a cluster series created from 
some images including the change points. Thereafter, 
at the time of recognizing the hand movement image, 
40 the cluster series is referred to for outputting the stored 
meaning. In this manner, the hand movement succes- 
sively made to carry the meaning in gesture or sign lan- 
guage can be recognized with higher accuracy, and 
accordingly can be correctly caught in meaning. 
45 [0038] According to a seventh aspect, in the sixth 
aspect, the device further comprises: 

a comprehensive movement recognition part for 
receiving the hand movement image, and output- 
50 ting a possibility for meaning by judging how the 
hand is moved and where the hand is located in the 
hand movement image; and 

a restriction condition storage part for previously 
storing a restriction condition for restricting, accord- 
55 ing to the successive hand movement, the meaning 
of the provided hand movement image, wherein 
the identification operation part obtains, for output, 
while taking the restriction condition into considera- 
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tion, a meaning corresponding to the symbol series 
outputted by the maximum likelihood cluster judge- 
ment part from the series identification dictionary 
part. 

[0039] As described above, in the seventh aspect, 
the restriction conditions relevant to the comprehensive 
hand movement are additionally imposed, and the hand 
movement image is defined by meaning. In this manner, 
the hand movement image can be recognized with 
higher accuracy. 

[0040] According to an eighth and a ninth aspects, 
in the sixth and the seventh aspects, 

the hand region detection part includes: 
a possible region cutting part for cutting a possible 
hand region from the hand images structuring the 
input hand movement image; 
a masking region storage part for storing a masking 
region used to extract only the possible hand region 
from an image of a rectangular region; 
a hand region image normalization part for super- 
imposing the masking region on each of the possi- 
ble hand regions cut from the images structuring 
the hand movement image, and normalizing each 
thereof to generate an image equivalent to the hand 
images used to calculate the eigenvectors; 
a hand region eigenspace projection part for calcu- 
lating eigenspace projection coordinates for the 
normalized images by projecting the images onto 
the eigenspace having the eigenvectors as the 
basts; 

a hand region maximum likelihood cluster judge- 
ment part for comparing each of the eigenspace 
projection coordinates calculated by the hand 
region eigenspace projection part with the statisti- 
cal information stored in the cluster information 
storage part, determining which cluster is the clos- 
est to each of the eigenspace projection coordi- 
nates, and outputting an estimate value indicating 
closeness between each of the symbols specifying 
the cluster and a cluster for reference; and 
a region determination part for outputting, accord- 
ing to the estimation values, position information on 
the possible hand region whose the estimation 
value is the highest and the cluster thereof. 

[0041] As described above, in the eighth and ninth 
aspects, the hand region is detected by projecting the 
possible hand region onto the eigenspace and then 
selecting the appropriate cluster. In this manner, the 
hand region and the cluster therefor can be simultane- 
ously determined. Accordingly, the hand region can be 
concurrently detected with the hand shape/position, or 
with the hand movement. 

[0042] According to a tenth to a twelfth aspects, in 
the first, the second, and the sixth aspects, 



the first hand image normalization part and the sec- 
ond hand image normalization part respectively 
include: 

a color distribution storage part for previously stor- 
5 ing a color distribution of the hand region to be 

extracted from the input hand image; 
a hand region extraction part for extracting the hand 
region from an input hand image according to the 
color distribution; 
10 a wrist region deletion part for finding which direc- 
tion a wrist is oriented, and deleting a wrist region 
from the hand region according to the direction; 
a region displacement part for displacing the hand 
region from which the wrist region is deleted to a 
15 predetermined location on the image; 

a rotation angle calculation part for calculating a 
rotation angle in such a manner that the hand in the 
hand region is oriented to a predetermined direc- 
tion; 

20 a region rotation part for rotating, according to the 
rotation angle, the hand region in such a manner 
that the hand therein is oriented to a direction; and 
a size normalization part for normalizing the rotated 
hand region to be in a predetermined size. 

25 

[0043] As described above, in the tenth to twelfth 
aspects, when normalizing the hand image, in addition 
to the deletion of the wrist region, the hand region is 
extracted based on color (beige). In this manner, the 

30 hand can be photographed with a non-artificial back- 
ground, and from the image taken in thereby, the hand 
region can be extracted, and therefore the hand shape 
and position can be recognized with higher accuracy. 
. [0044] According to a thirteenth aspect, in the first 

35 aspect, the device further comprising: 

an instruction storage part for storing an instruction 
corresponding respectively to the shape informa- 
tion and the position information; and 

40 an instruction output part for receiving the shape 
information and the position information provided 
by the shape/position output part, and obtaining, for 
output, the instruction respectively corresponding 
to the shape information and the position informa- 

45 tion from the instruction storage part. 

[0045] As described above, in the thirteenth aspect, 
the device in the first aspect can be used as an interface 
for other devices according to the hand shape and posi- 
so tion. 

[0046] A fourteenth aspect of the present invention 
is directed to a method for recognizing hand shape and 
position of a hand image obtained by optical read 
means (hereinafter, referred to as input hand image), 
55 the method comprising: 

a first normalization step of receiving a plurality of 
hand images varied in hand shape and position, 
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and after a wrist region is respectively deleted 
therefrom, subjecting the hand images to normali- 
zation in a predetermined manner (in hand orienta- 
tion, image size, image contrast) to generate hand 
shape images; 5 
an analysis step of calculating an eigenvalue and 
an eigenvector from each of the hand shape 
images under analysis based on an eigenspace 
method; 

a first projection step of calculating eigenspace pro- io 
jection coordinates respectively for the hand shape 
images by projecting the hand shape images onto 
an eigenspace having the eigenvectors as a basis; 
a second normalization step of receiving the input 
hand image, and after a wrist region is deleted 15 
therefrom, normalizing the input hand image to 
generate an input hand shape image being equiva- 
lent to the hand shape images; 
a second projection step of calculating eigenspace 
projection coordinates for the input hand shape 20 
image by projecting the input hand shape image 
onto the eigenspace having the eigenvectors as the 
basis; 

a comparison step of comparing the eigenspace 
projection coordinates calculated for the hand 25 
shape images with the eigenspace projection coor- 
dinates calculated for the input hand shape image, 
and determining which of the hand shape images is 
ciosest to the input hand shape Image; and 
a step of outputting the shape information and the 30 
position information on the closest hand shape 
image. 

[0047] As described above, in the fourteenth 
aspect, a plurality of hand images varied in hand shape 35 
and position and an input hand image for recognition 
are all subjected to wrist region deletion before normal- 
ization. Therefore, the hand images can be normafized 
with higher accuracy compared to a case where the 
hand images are simply subjected to normalization in 40 
size and contrast. Accordingly, under a method based 
on the eigenspace, the hand shape and position can be 
recognized with accuracy of a sufficient degree. 
[0048] Further, by using the method based on the 
eigenspace, geometric characteristics such as the 45 
number of extended fingers can be recognized, 
whereby rather complicated hand shapes having little 
geometric characteristics can be correctly recognized. 
[0049] A fifteenth aspect of the present invention is 
directed to a method for recognizing hand shape and 50 
position of a hand image obtained by optical read 
means (hereinafter, referred to as input hand image), 
the method comprising: 

a first norma/izatron step of receiving a plurality of 55 
hand images varied in hand shape and position, 
and after a wrist region is respectively deleted 
therefrom, subjecting the hand images to normali- 



zation in a predetermined manner (in hand orienta- 
tion, image size, image contrast) to generate hand 
shape images; 

an analysis step of calculating an eigenvalue and 
an eigenvector from each of the hand shape 
images under analysis based on an eigenspace 
method; 

a first projection step of calculating eigenspace pro- 
jection coordinates respectively for the hand shape 
images by projecting the hand shape images onto 
an eigenspace having the eigenvectors as a basis; 
an evaluation step of classifying, under cluster eval- 
uation, the eigenspace projection coordinates into 
clusters, determining which of the hand shape 
images belongs to which cluster, and obtaining sta- 
tist] caJ information about each of the clusters; 
a second normalization step of receiving the input 
hand image, and after a wrist region is deleted 
therefrom, normalizing the input hand image to 
generate an input hand shape image being equiva- 
lent to the hand shape images; 
a second projection step of calculating eigenspace 
projection coordinates for the input hand shape 
image by projecting the input hand shape image 
onto the eigenspace having the eigenvectors as the 
basis; 

a judgement step of comparing the eigenspace pro- 
jection coordinates calculated for the input hand 
shape image with each of the statistical information, 
and determining the closest cluster; 
a comparison step of comparing each of the hand 
shape images included in the closest cluster with 
the input hand shape image, and determining 
which of the hand shape images is most analogous 
to the input hand shape image, and 
a step of outputting the shape information and the 
position information on the most analogous hand 
shape image. 

[0050] As described above, in the fifteenth aspect, 
the hand shape images are classified into clusters, 
under cluster evaluation. Thereafter, it is decided to 
which cluster an input hand image belongs, and then is 
decided which hand shape image in the cluster is the 
closest to the input hand image. In this manner, the fre- 
quency of comparison for matching can be reduced and 
the processing speed can be improved. Further, it is 
possible to accurately define each image by hand shape 
and position even if the images are analogous in hand 
position from a certain direction but different in hand 
shape. 

[0051] According to a sixteenth aspect, in the fif- 
teenth aspect, 

the comparison step includes, 
a step of classifying, into clusters, the hand shape 
images included in the cluster determined in the 
judgement step before comparing the hand shape 
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images with the input hand shape image generated 
in the second normalization step; 
a step of calculating a statistic representing the 
clusters; and 

a step of calculating a distance between the input 5 
hand shape image and the statistic, and outputting 
a hand shape included in the closest cluster. 

[0052] As described above, in the sixteenth aspect, 
in a case where the hand shape images are enough to 10 
be defined on(y by hand shape, the hand shape can be 
recognized more accurate than a case where the hand 
shape and the hand position are both recognized. 
[0053] According to a seventeenth aspect, in the fif- 
teenth aspect, is 

in the evaluation step, according to the hand shape 
images and the shape information, a partial region 
is calculated respectively for the hand shape 
images for discrimination, and 20 
in the comparison step, the hand shape images in 
the cluster determined in the judgement step are 
compared with the input hand shape image gener- 
ated in the second normalization step only in the 
partial region corresponding to the cluster. 25 

[0054] As described above, in the seventeenth 
aspect, a partial region is predetermined, and the com- 
parison for matching between the hand shape images 
and the input hand shape image is done for the parts 30 
within the partial region. In this manner, the comparison 
for matching can be less frequent than the fifteenth 
aspect, and accordingly still higher-speed processing 
can be achieved with a higher degree of accuracy even 
if the images are analogous in hand position from a cer- 35 
tain direction but different in hand shape. 
[0055] According to an eighteenth aspect, in the fif- 
teenth aspect, 

when the input hand image is plurally provided by 40 
photographing a hand from several directions, 
in the second normalization step, the input hand 
shape image is generated for each of the input 
hand images, 

in the second projection step, eigenspace projec- 45 
tion coordinates in the eigenspace is calculated 
respectively for the input hand shape images gen- 
erated in the second normalization step, 
in the judgement step, each of the eigenspace pro- 
jection coordinates calculated in the second projec- so 
tion step is compared with the statistical 
information, and the closest cluster is determined, 
and 

in the comparison step, the closest clusters deter- 
mined in the judgement step are merged, and hand 55 
shape and position consistent to the shape informa- 
tion and the position information about the hand 
shape images in each of the clusters is estimated. 



[0056] As described above, in the eighteenth 
aspect, input hand images obtained from a plurality of 
cameras can be defined by hand shape and position by 
merging clusters, based on the closeness in distance 
thereamong, determined for each of the input hand 
images. In this manner, even a hand image which has 
been difficult to recognize from one direction (e.g., a 
hand image from the side) can be defined by hand 
shape and position with accuracy. 
[0057] A nineteenth aspect of the present invention 
is directed to a method for recognizing a meaning of 
successive hand images (hereinafter, referred collec- 
tively to as hand movement image) obtained by optical 
read means, the device comprising: 

a first normalization step of receiving a plurality of 
hand images varied in hand shape and position, 
and after a wrist region is respectively deleted 
therefrom, subjecting the hand images to normali- 
zation in a predetermined manner (in hand orienta- 
tion, image size, image contrast) to generate hand 
shape images; 

an analysis step of calculating an eigenvalue and 
an eigenvector from each of the hand shape 
images under analysis based on an eigenspace 
method; 

a first projection step of calculating eigenspace pro- 
jection coordinates respectively for the hand shape 
images by projecting the hand shape images onto 
an eigenspace having the eigenvectors as a basis; 
an evaluation step of classifying, into clusters, the 
eigenspace projection coordinates under cluster 
evaluation, determining which of the hand shape 
images belongs to which cluster, and obtaining sta- 
tistical information about each cluster; 
a detection step of receiving the hand movement 
image, and detecting a hand region respectively 
from the images structuring the hand movement 
image; 

a segmentation step of determining how the hand is 
moved in each of the detected hand regions, and 
finding any change point in hand movement accord- 
ing thereto; 

a cutting step of cutting an image corresponding to 
the detected hand region respectively from the 
images including the change points; 
a second normalization step of respectively normal- 
izing one or more hand images (hereinafter, 
referred to as hand image series) cut from the hand 
movement image, after a wrist region is each 
deleted therefrom, and generating input hand 
shape images being equivalent to the hand shape 
images; 

a second projection step of calculating eigenspace 
projection coordinates for each of the input hand 
shape images by projecting the input hand shape 
images onto the eigenspace having the eigenvec- 
tors as the basis; 
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a judgement step of comparing each of the 
eigenspace projection coordinates calculated for 
the input hand shape images with the statistical 
information, determining which cluster is the clos- 
est, and outputting a symbol each specifying the 
clusters; 

a step of storing the symbols (hereinafter, referred 
to symbol series) corresponding to the judged hand 
image series together with a meaning of the hand 
movement image; and 

an identification step of outputting, in order to iden- 
tify the hand movement image, a meaning corre- 
sponding to the judged symbol series based on the 
stored symbol series and meaning. 

[0058J As described above, in the nineteenth 
aspect, the meaning of the hand movement succes- 
sively made to carry a meaning in gesture or sign lan- 
guage is previously stored together with a cluster series 
created from some images including the change points. 
Thereafter, at the time of recognizing the hand move- 
ment image, the cluster series is referred to for output- 
ting the stored meaning. In this manner, the hand 
movement successively made to carry the meaning in 
gesture or sign language can be recognized with higher 
accuracy, and accordingly can be correctly caught in 
meaning. 

[0059] According to a twentieth aspect, in the nine- 
teenth aspect, the method further comprises: 

a recognition step of receiving the hand movement 
image, and outputting a possibility for meaning by 
judging how the hand is moved and where the hand 
is located in the hand movement image; and 
a storage step of previously storing a restriction 
condition for restricting, according to the succes- 
sive hand movement, the meaning of the provided 
hand movement image, wherein 
the identification step of outputting, while taking the 
restriction condition into consideration, a meaning 
corresponding to the judged symbol series based 
on the stored symbol series and meaning. 

[0060] As described above, in the twentieth aspect, 
the restriction conditions relevant to the comprehensive 
hand movement are additionally imposed, and the hand 
movement image is defined by meaning. In this manner, 
the hand movement image can be recognized with 
higher accuracy. 

[0061] According to a twenty-first and a twenty-sec- 
ond aspects, in the nineteenth and the twentieth 
aspects, 

the detection step includes: 

a cutting step of cutting a possible hand region from 
each hand image structuring the input hand move- 
ment image; 

a storage step of storing a masking region used to 



extract only the possible hand region from an image 
of a rectangular region; 

a normalization step of superimposing the masking 
region on each of the possible hand regions cut 

5 from each hand image structuring the hand move- 

ment image, and normalizing each thereof to gen- 
erate an image equivalent to the hand images used 
to calculate the eigenvectors; 
a projection step of calculating eigenspace projec- 

w tion coordinates for the normalized images by pro- 
jecting the images onto the eigenspace having the 
eigenvectors as the basis; 

a judgement step of comparing each of the 
eigenspace projection coordinates with the statisti- 

75 cal information, determining which cluster is the 
closest, and outputting an estimate value indicating 
closeness between each of the symbols specifying 
the cluster and a cluster for reference; and 
a determination step of outputting, according to the 

20 estimation values, position information on the pos- 
sible hand region whose the estimation value is the 
highest and the cluster thereof. 

[0062] As described above, in the twenty-first and 
25 the twenty-second aspects, the hand region is detected 
by projecting the possible hand region onto the 
eigenspace and then selecting the appropriate cluster, 
in this manner, the hand region and the cluster therefor 
can be simultaneously determined. Accordingly, the 
30 hand region can be concurrently detected with the hand 
shape/position, or with the hand movement. 
[0063] According to a twenty-third to a twenty-fifth 
aspects, in the fourteenth, the fifteenth, and the nine- 
teenth aspects, 

35 

the first normalization step and the second normal- 
ization step respectively include: 
a color storage step of previously storing a color 
distribution of the hand region to be extracted from 

40 the input hand image; 

a step of extracting the hand region from an input 
hand image according to the color distribution; 
a step of finding which direction a wrist is oriented, 
and deleting a wrist region from the hand region 

45 according to the direction; 

a step of displacing the hand region from which the 
wrist region is deleted to a predetermined location 
on the image; 

a step of calculating a rotation angle in such a man- 
so ner that the hand in the hand region is oriented to a 
predetermined direction; 

a step of rotating, according to the rotation angle, 
the hand region in such a manner that the hand 
therein is oriented to a direction; and 
55 a step of normalizing the rotated hand region to be 
in a predetermined size. 

[0064] As described above, in the twenty-third to 
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twenty-fifth aspects, when normalizing the hand image, 
in addition to the deletion of the wrist region, the hand 
region is extracted based on color (beige). In this man- 
ner, the hand can be photographed with a non-artificial 
background, and from the image taken in thereby, the 5 
hand shape and position can be recognized with higher 
accuracy. 

[0065] According to a twenty-sixth aspect, in the 
fourteenth aspect, the method further comprising: 

10 

an instruction storage step of storing an instruction 
corresponding respectively to the shape informa- 
tion and the position information; and 
a step of receiving the shape information and the 
position information outputted in the output step, 15 
and obtaining, for output, the instruction respec- 
tively corresponding to the shape information and 
the position information stored in the instruction 
storage step. 

20 

[0066] As described above, in the twenty-sixth 
aspect, the method in the fourteenth aspect can be 
used as an interface for other devices according to the 
hand shape and position. 

[0067] A twenty-seventh aspect of the present 25 
invention is directed to a recording medium being stored 
a program to be executed on a computer device for car- 
rying out a method for recognizing hand shape and 
position of a hand image obtained by optical read 
means (hereinafter, referred to as input hand image), 30 
the program being for realizing an operational environ- 
ment on the computer device including: 

a first normalization step of receiving a plurality of 
hand images varied in hand shape and position, 35 
and after a wrist region is respectively deleted 
therefrom, subjecting the hand images to normali- 
zation in a predetermined manner (in hand orienta- 
tion, image size, image contrast) to generate hand 
shape images; ao 
an analysis step of calculating an eigenvalue and 
• an eigenvector from each of the hand shape 
images under analysis based on an eigenspace 
method; 

a first projection step of calculating eigenspace pro- 45 
jection coordinates respectively for the hand shape 
images by projecting the hand shape images onto 
an eigenspace having the eigenvectors as a basis; 
a second normalization step of receiving the input 
hand image, and after a wrist region is deleted so 
therefrom, normalizing the input hand image to 
generate an input hand shape image being equiva- 
lent to the hand shape images; 
a second projection step of calculating eigenspace 
projection coordinates for the input hand shape 55 
image by projecting the input hand shape image 
onto the eigenspace having the eigenvectors as the 
basis; 



a comparison step of comparing the eigenspace 
projection coordinates calculated for the hand 
shape images with the eigenspace projection coor- 
dinates calculated for the input hand shape image, 
and determining which of the hand shape images is 
closest to the input hand shape image; and 
a step of outputting the shape information and the 
position information on the closest hand shape 
image. 

[0068] A twenty-eighth aspect of the present inven- 
tion is directed to a recording medium being stored a 
program to be executed on a computer device for carry- 
ing out a method for recognizing hand shape and posi- 
tion of a hand image obtained by optical read means 
(hereinafter, referred to as input hand image), the pro- 
gram being for realizing an operational environment on 
the computer device including: 

a first normalization step of receiving a plurality of 
hand images varied in hand shape and position, 
and after a wrist region is respectively deleted 
therefrom, subjecting the hand images to normali- 
zation in a predetermined manner (in hand orienta- 
tion, image size, image contrast) to generate hand 
shape images; 

an analysis step of calculating an eigenvalue and 
an eigenvector from each of the hand shape 
images under analysis based on an eigenspace 
method; 

a first projection step of calculating eigenspace pro- 
jection coordinates respectively for the hand shape 
images by projecting the hand shape images onto 
an eigenspace having the eigenvectors as a basis; 
an evaluation step of classifying, into clusters, the 
eigenspace projection coordinates under cluster 
evaluation, determining which of the hand shape 
images belongs to which cluster, and obtaining sta- 
tistical information about each cluster; 
a second normalization step of receiving the input 
hand image, and after a wrist region is deleted 
therefrom, normalizing the input hand image to 
generate an input hand shape image being equiva- 
lent to the hand shape images; 
a second projection step of calculating eigenspace 
projection coordinates for the input hand shape 
image by projecting the input hand shape image 
onto the eigenspace having the eigenvectors as the 
basis; 

a judgement step of comparing the eigenspace pro- 
jection coordinates calculated for the input 'hand 
shape image with each of coordinates included in 
the statistical information, and determining which 
cluster is the closest; 

a comparison step of comparing the hand shape 
images included in the closest cluster with the input 
hand shape image, and determining which of the 
hand shape images is analogous most closely to 
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the input hand shape image; and 
a step of outputting the shape information and the 
position information on the most analogous hand 
shape image. 

[0069] A twenty-ninth aspect of the present inven- 
tion is directed to the recording medium as claimed in 
28, wherein the comparison step includes: 

a step of classifying, into clusters, the hand shape 
images included in the cluster determined in the 
judgement step before comparing the hand shape 
images with the input hand shape image generated 
in the second normalization step; 
a step of calculating a statistic representing the 
clusters; and 

a step of calculating a distance between the input 
hand shape image and the statistic, and outputting 
a hand shape included in the closest cluster. 

[0070] According to a thirtieth aspect, in the twenty- 
eighth aspect, 

in the evaluation step, according to the hand shape 
images and the shape information, a partial region 
is calculated respectively for the hand shape 
images for discrimination, and 
in the comparison step, the hand shape images in 
the cluster determined in the judgement step are 
compared with the input hand shape image gener- 
ated in the second normalization step only in the 
partial region corresponding to the cluster. 

[0071] According to a thirty-first aspect, in the 
twenty-eighth aspect, 

when the input hand image is plurally provided by 
photographing a hand from several directions, 
in the second normalization step, the input hand 
shape image is generated for each of the input 
hand images, 

in the second projection step, eigenspace projec- 
tion coordinates in the eigenspace is calculated 
respectively for the input hand shape images gen- 
erated in the second normalization step, 
in the judgement step, each of the eigenspace pro- 
jection coordinates calculated in the second projec- 
tion step is compared with the statistical 
information, and the closest cluster is determined, 
and 

in the comparison step, the closest clusters deter- 
mined in the judgement step are merged, and hand 
shape and position consistent to the shape informa- 
tion and the position information about the hand 
shape images in each of the clusters is estimated. 

[0072] A thirty-second aspect of the present inven- 
tion is directed to a recording medium being stored a 



program to be executed on a computer device for carry- 
ing out a method for recognizing a meaning of succes- 
sive hand images (hereinafter, referred collectively to as 
hand movement image) obtained by optical read 
5 means, the program being for realizing an operational 
environment on the computer device including: 

a first normalization step of receiving a plurality of 
hand images varied in hand shape and position, 
10 and after a wrist region is respectively deleted 
therefrom, subjecting the hand images to normali- 
zation in a predetermined manner (in hand orienta- 
tion, image size, image contrast) to generate hand 
shape images; 

75 an analysis step of calculating an eigenvalue and 
an eigenvector from each of the hand shape 
images under analysis based on an eigenspace 
method; 

a first projection step of calculating eigenspace pro- 
20 jection coordinates respectively for the hand shape 
images by projecting the hand shape images onto 
an eigenspace having the eigenvectors as a basis; 
an evaluation step of classifying, into clusters, the 
eigenspace projection coordinates under cluster 
25 evaluation, determining which of the hand shape 
images belongs to which cluster, and obtaining sta- 
tistical information about each cluster; 
a detection step of receiving the hand movement 
image, and detecting a hand region respectively 
30 from the hand images structuring the hand move- 
ment image; 

a segmentation step of determining how the hand is 
moved in each of the detected hand regions, and 
finding any change point in hand movement accord- 

35 ing thereto; 

a cutting step of cutting an image corresponding to 
the detected hand region respectively from the 
images including the change points; 
a second normalization step of respectively normal - 

40 izing one or more hand images (hereinafter, 
referred to as hand image series) cut from the hand 
movement image, after a wrist region is each 
deleted therefrom, and generating input hand 
shape images being equivalent to the hand shape 

45 images; 

a second projection step of calculating eigenspace 
projection coordinates for each of the input hand 
shape images by projecting the input hand shape 
images onto the eigenspace having the eigenvec- 

50 tors as the basis; 

a judgement step of comparing each of the 
eigenspace projection coordinates calculated for 
the input hand shape images with the statistical 
information, determining which cluster is the clos- 

55 est, and outputting a symbol each specifying the 
clusters; 

a step of storing the symbols (hereinafter, referred 
to symbol series) corresponding to the judged hand 
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image series together with a meaning of the hand 
movement image; and 

an identification step of outputting, in order to iden- 
tify the hand movement image, a meaning corre- 
sponding to the judged symbol series based on the 
stored symbol series and meaning. 

[0073] According to a thirty-third aspect, in the 
thirty-second aspect, the method further comprising: 

a recognition step of receiving the hand movement 
image, and outputting a possibility for meaning by 
judging how the hand is moved and where the hand 
is located in the hand movement image; and 
a storage step of previously storing a restriction 
condition for restricting, according to the succes- 
sive hand movement, the meaning of the provided 
hand movement image, wherein 
the identification step of outputting, while taking the 
restriction condition into consideration, a meaning 
corresponding to the judged symbol series based 
on the stored symbol series and meaning. 

[0074] According to a thirty-fourth and a thirty-fifth 
aspect, in the thirty-second and the thirty-third aspects, 

the detection step includes: 

a cutting step of cutting a possible hand region from 
the hand images structuring the input hand move- 
ment image; 

a storage step of storing a masking region used to 
extract only the possible hand region from an image 
of a rectangular region; 

a normalization step of superimposing the masking 
region on each of the possible hand regions cut 
from each hand image structuring the hand move- 
ment image, and normalizing each thereof to gen- 
erate an image equivalent to the hand images used 
to calculate the eigenvectors; 
a projection step of calculating e)genspace projec- 
tion coordinates for the normalized images by pro- 
jecting the images onto the eigenspace having the 
eigenvectors as the basis; 

a judgement step of comparing each of the 
eigenspace projection coordinates with the statisti- 
cal information, determining which cluster is the 
closest, and outputting an estimate value indicating 
closeness between each of the symbols specifying 
the cluster and a cluster for reference; and 
a determination step of outputting, according to the 
estimation values, position information on the pos- 
sible hand region whose the estimation value is the 
highest and the cluster thereof. 

[0075] According to a thirty-sixth to a thirty-eighth 
aspects, in the twenty-seventh, the twenty-eighth, and 
the thirty-second aspects, 



the first normalization step and the second normal- 
ization step respectively include: 
a color storage step of previously storing a color 
distribution of the hand region to be extracted from 

5 the input hand image; 

a step of extracting the hand region from an input 
hand image according to the color distribution; 
a step of finding which direction a wrist is oriented, 
and deleting a wrist region from the hand region 

w according to the direction; 

a step of displacing the hand region from which the 
wrist region is deleted to a predetermined location 
on the image; 

a step of calculating a rotation angle in such a man- 
15 ner that the hand in the hand region is oriented to a 
predetermined direction; 

a step of rotating, according to the rotation angle, 
the hand region in such a manner that the hand 
therein is oriented to a direction; and 
20 a step of normalizing the rotated hand region to be 
in a predetermined size. 

[0076] According to a thirty-ninth aspect, in the thir- 
tieth aspect, the recording medium further comprising: 

25 

an instruction storage step of storing an instruction 
corresponding respectively to the shape informa- 
tion and the position information; and 
a step of receiving the shape information and the 
30 position information outputted in the output step, 
and obtaining, for output, the instruction respec- 
tively corresponding to the shape information and 
the position information stored in the instruction 
storage step. 

35 

[0077] As described above, in the twenty-seventh to 
thirty-ninth aspects, the program for carrying out the 
method for recognizing hand shape and position in the 
fourteenth to twenty-sixth aspects is recorded on the 
40 recording medium. This is to supply the method in a 
form of software. 

[0078] These and other objects, features, aspects 
and advantages of the present invention will become 
more apparent from the following detailed description of 
45 the present invention when taken in conjunction with the 
accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

50 [0079] 

FIG. 1 is a block diagram showing the structure of a 
device for recognizing hand shape and position 
according to a first embodiment of the present 
55 invention; 

FIG. 2 shows the outline of the processing carried 
out by a hand shape image normalization part 1 1 in 
FIG. 1; 
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FIG. 3 is an exemplary storage table provided in a 
hand shape image information storage part 12A in 
FIG.1; 

FIG. 4 shows the outline of an exemplary method 
for calculating an eigenspace in an eigenspace cal- 5 
culation part 13 in FIG. 1; 

FIG. 5 shows the outline of the processing carried 
out by an eigenspace projection part 15 in FIG. 1 ; 
FIG. 6 is a diagram showing an exemplary hard- 
ware structure which implements the device for rec- 10 
ognizing hand shape and position of the first 
embodiment; 

FIG. 7 shows an exemplary case where an instruc- 
tion storage part which stores any instruction rele- 
vant to shape information and position information is 
is stored with instructions to an audio device; 
FIG. 8 is a block diagram showing the structure of a 
device for recognizing hand shape and position 
according to a second embodiment of the present 
invention; 20 
FIG. 9 is an exemplary storage table provided in a 
hand shape image information storage part 12B in 
FIG. 8; 

FIG. 10 is a flowchart exemplarily showing the 
processing carried out by a cluster evaluation part 25 
16 in FIG. 8; 

FIG. 1 1 shows the outline of an exemplary concept 
of a comparison technique carried out by an image 
comparison part 26 in FIG. 8; 

FIG. 1 2 is a block diagram showing a device for rec- 30 
ognizing hand shape and position according to a 
third embodiment of the present invention; 
FIG. 13 shows exemplary images judged as being 
analogous and classified into the same cluster by 
the cluster evaluation part 16 in FIG. 8; 35 
FIG. 14 shows an exemplary concept of processing 
carried out by a cluster evaluation/frame discrimina- 
tion part 18 in FIG. 12; 

FIG. 15 shows an exemplary concept of determin- 
ing, in a device for recognizing hand shape and 40 
position according to a fourth embodiment of the 
present invention, a hand shape image from input 
hand images obtained from a plurality of cameras; 
FIG. 1 6 is a block diagram showing the structure of 
a device for recognizing hand shape and position 45 
according to a fifth embodiment of the present 
invention; 

FIG. 17 shows a concept of processing carried out 
by a hand region detection part 28, a hand move- 
ment segmentation part 29, and a hand image cut- so 
ting part 30 in FIG. 16; 

FIG. 18 shows a hand image series in FIG. 16 and 
an exemplary cluster series obtained therefrom; 
FIG. 19 shows an exemplary storage format of 
series identification dictionary 32 in FIG. 16; ss 
FIG. 20 shows an exemplary storage format of the 
series identification dictionary 32 in FIG. 16; 
FIG. 21 is a block diagram showing the structure of 



a device for recognizing hand shape and position 
according to a sixth embodiment of the present 
invention; 

FIG. 22 is an exemplary storage table provided in a 
hand shape image information storage part 12C in 
FIG. 21; 

FIG. 23 shows the outline of an exemplary method 
for defining hand position; 

FIG. 24 is a block diagram showing the structure of 
a device for recognizing hand shape and position 
according to a seventh embodiment of the present 
invention; 

FIG. 25 is a block diagram showing the detailed 
structure of a hand region detection part provided in 
a device for recognizing hand shape and position 
according to an eighth embodiment of the present 
invention 

FIG. 26 shows exemplary processing carried out by 
a possible region cutting part 39 in FIG. 25; 
FIG. 27 is a schematic diagram showing the 
processing carried out by an image normalization 
part 41 in FIG. 25; 

FIG. 28 shows an exemplary mask region stored in 
a masking region storage part 40 in FIG. 25; 
FIG. 29 is a block diagram showing the detailed 
structure of a hand region detection part provided in 
a device for recognizing hand shape and position 
according to a ninth embodiment of the present 
invention; 

FIG. 30 shows exemplary cluster transition informa- 
tion stored in a cluster transition information stor- 
age part 43 in FIG. 29; 

FIG. 31 exemplarily shows masking regions stored 
in a masking region storage part 45 in FIG. 29; 
FIG. 32 is a block diagram showing the structure, to 
a greater degree, of the hand image normalization 
parts 11 and 21 provided in a device for recognizing 
hand shape and position according to a tenth 
embodiment of the present invention; 
FIG. 33 is a diagram exemplarily showing the struc- 
ture of a storage table provided in a color distribu- 
tion storage part 61 in FIG. 32; 
FIG. 34 shows the outline of processing carried out 
by a rotation angle calculation part 65 in FIG. 32; 
FIG. 35 shows exemplary processing carried out by 
a finger characteristic emphasizing part 68 in FIG. 
32; 

FIG. 36 shows, in a device for recognizing hand 
shape and position according to an eleventh 
embodiment of the present invention, an exemplary 
concept of determining, before normalization, a 
hand orientation by referring to input hand images 
obtained from a plurality of cameras; and 
FIG. 37 shows the outline of an exemplary method 
for defining hand position. 
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DESCRIPTION OF THE PREFERRED EMBODI- 
MENTS 

[0080] Prior to describing embodiments of the 
present invention, words of "hand shape" and "hand s 
position" to be used in the description are defined. 
[0081] Hand movement made by a human being 
may include some meaning such as gesture or sign lan- 
guage. In this case, such hand movement may be car- 
ried out in two manners: one is by bending one or more w 
fingers, to some degree or completely, against his/her 
palm to form a certain hand shape; the other is by ori- 
enting his/her hand to' a certain direction with his/her 
wrist and arm joints. The shape made by the former 
movement is referred to as "hand shape", and the hand 75 
orientation determined by the latter is referred to as 
"hand position". 

[0082] See FIG. 37 for more strict definition of the 
hand position. 

[0083] First, in a three-dimensional space where a so 
hand in a certain shape is observed, a local coordinates 
system / is defined, where a direction from the cross- 
sectional center of the wrist to the fingertip of the middle 
finger is an XI axis (palm principal axis); a direction 
orthogonal to the Xi axis and perpendicular to the palm 25 
is a Yi axis; and a direction orthogonal to both the Xi 
and Yi axes is a Zi axis (a in FIG. 37). A camera coordi- 
nates system c (X c , Y c , and Z c axes; the axes are 
orthogonal to one another) onto which hand images 
taken from a camera are projected is also set in 30 
advance {b in FIG. 37). Hereinafter, the Z c axis in the 
camera coordinates system c is referred to as an optical 
axis. 

[0084] Further, with respect to the hand image pro- 
jected on the camera coordinates system c, differences 35 
between the axes in the local coordinates system / and 
the axes in the camera coordinates system c are each 
defined as follows (c in FIG. 37): 

G: rotation angle to the X c axis 40 
0: rotation angle to an X c axis - Z c axis plane 
rotation angle to an X c axis - Y c axis plane 

[0085] The hand position is defined by these rota- 
tion angles of 0, 0 and y. 45 
[0086] Although the hand position can be strictly 
defined in such manner, it is also possible to qualita- 
tively define, with respect to the camera, how the palm 
is oriented as "vertically or tilted toward left" and which 
direction the palm is facing as "front-facing or left-fac- so 
ing". In the present invention, both definitions are adapt- 
able. For the sake of clarity in the following 
embodiments, however, the qualitative definition is 
exemplarily adapted for hand position. 
[0087] By referring to accompanying drawings, the 55 
embodiments of the present invention are described in 
detail. 



(First Embodiment) 

[0088] A first embodiment of the present invention 
provides a device and a method for recognizing hand 
shape and position even if a hand image to be provided 
for recognition is rather complicated in shape. This is 
implemented by a method based on the eigenspace 
method, in which, specifically, a plurality of p restored 
hand images varied in hand shape and position and the 
to-be-provided hand image are all subjected to normali- 
zation after a wrist region is respectively deleted there- 
from. 

[0089] FIG. 1 is a block diagram showing the struc- 
ture of the device for recognizing hand shape and posi- 
tion of the first embodiment. In FIG. 1, the device is 
structured by a storage part construction system 1 and 
a shape/position recognition system 2. 
[0090] The storage part construction system 1 con- 
structs, in advance, information necessary for recogniz- 
ing a to-be-inputted hand image (hereinafter, referred to 
as input hand image) according to a plurality of hand 
shape images varied in hand shape and position, and 
shape information and position information on the hand 
shape images. The shape/position recognition system 2 
determines hand shape and hand position of the input 
hand image by utilizing the information constructed by 
the storage part construction system 1 . The information 
is stored in a storage part in the storage part construc- 
tion system 1 . 

[0091] First, the storage part construction system 1 
and the shape/position recognition system 2 are struc- 
turally described, respectively. In FIG. 1, the storage 
part construction system 1 includes a hand image nor- 
malization part 1 1 , a hand shape image information 
storage part 12A, an eigenspace calculation part 13, an 
eigenvector storage part 14, and an eigenspace projec- 
tion part 15, while the shape/position recognition sys- 
tem 2 includes a hand image normalization part 21, an 
eigenspace projection part 22, a hand shape image 
selection part 23, and a shape/position output part 24. 
[0092] The hand image normalization part 1 1 
receives a plurality of hand images varied in hand 
shape and position, deletes a wrist region from each of 
the hand images before normalization, and then gener- 
ates hand shape images. The hand shape images are 
stored in the hand shape image information storage 
part 12A together with separately-provided shape infor- 
mation and position information on the hand shape 
images, and eigenspace projection coordinates each 
obtained by projecting the hand shape images onto an 
eigenspace. The eigenspace calculation part 13 calcu- 
lates eigenvalues and eigenvectors, under eigenspace 
analysis, from each of the hand shape images stored in 
the hand shape image information storage part 12A. 
Herein, the eigenspace analysis carried out by the 
eigenspace calculation part 13 is varied in manner, and 
exemplarily includes a technique for calculating an 
eigenspace by subjecting the hand shape images 
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stored in the hand shape image information storage 
part 12A to principal component analysis, or a tech- 
nique for calculating a hand shape judgement space by 
carrying out a judgement analysis on the hand shape 
images and shape information stored in the hand shape 5 
image information storage part 12A. In the first embodi- 
ment, the former technique is applied to describe the 
following operation. The eigenvector storage part 14 
stores the eigenvectors calculated by the eigenspace 
calculation part 13. The eigenspace projection part 15 w 
projects the hand shape images stored in the hand 
shape image information storage part 12A onto an 
eigenspace where the eigenvectors stored in the eigen- 
vector storage part 1 4 are used as a basis, and then cal- 
culates projection coordinates in the eigenspace. The 75 
calculated projection coordinates are stored in the hand 
shape image information storage part 12A. 
[0093] The hand image normalization part 21 
receives an input hand image, deletes a wrist region 
therefrom before normalization in a predetermined 20 
manner, and then generates an input hand shape 
image. This is carried out in a similar manner to the 
hand shape images stored in the hand shape image 
information storage part 12A. The eigenspace projec- 
tion part 22 projects the generated input hand shape 25 
image onto the eigenspace where some eigenvector 
stored in the eigenvector storage part 14 is used as the 
basis, and then calculates projection coordinates in the 
eigenspace. The hand shape image selection part 23 
compares the projection coordinates calculated by the 30 
eigenspace projection part 22 with the eigenspace pro- 
jection coordinates stored in the hand shape image 
information storage part 12A, and then determines 
which hand shape image is the closest to the input hand 
shape image. The shape/position output part 24 outputs 35 
shape information and position information on the hand 
shape image determined as being the closest by the 
hand shape image selection part 23. 
[0094] Next, by referring to FIGS. 2 to 5, the method 
for recognizing hand shape and position is described 40 
stepwise. FIG. 2 shows the outline of the processing 
carried out by the hand image normalization part 11 in 
FIG. 1 . FIG. 3 is an exemplary storage table provided in 
the hand shape image information storage part 12A in 
FIG. 1. FIG. 4 shows the outline of an exemplary 45 
method for calculating an eigenspace in the eigenspace 
calculation part 13 in FIG. 1. The method exemplarily 
illustrated in FIG. 4 is the above-described technique for 
applying principal component analysis. FIG. 5 shows 
the outline of an exemplary method for calculating 50 
eigenspace projection coordinates in the eigenspace 
projection part 15 in FIG. 1. 

[0095] First, it is described how the storage part 
construction system 1 is operated to process. 
[0096] As described in the foregoing, from a plural- 55 
ity of hand images varied in hand shape and position, 
the storage part construction system 1 generates, in 
advance, hand shape images for comparison with an 



input hand image provided to the shape/position recog- 
nition system 2. At this point in time, the storage part 
construction system 1 normalizes the hand images so 
as to calculate an eigenspace for the hand shape 
images. 

[0097] Refer to FIG. 2. The hand image normaliza- 
tion part 1 1 first determines the orientation of a prede- 
termined hand image (which direction the wrist is 
oriented)(fc in FIG. 2). Thereafter, the hand image nor- 
malization part 1 1 draws two linear lines, each along the 
boundary between the wrist and background, from the 
end of the wrist towards the palm. Then, the hand image 
normalization part 1 1 determines, as an end point of the 
wrist(wrist cut point), a point where a distance from the 
line to the hand contour is equal to a predetermined 
threshold value or more (c in FIG. 2). The hand image 
normalization point 1 1 then draws a line, perpendicular 
to the linear line, from the wrist cut point. In this manner, 
the hand is divided into two regions: a wrist region and 
a hand region, and the wrist region is deleted from the 
hand image (d in FIG. 2). Thereafter, the hand image 
normalization part 1 1 rotates the hand region in such a 
manner that the palm principal axis thereof is oriented 
to a certain direction (e in FIG. 2). In this embodiment, 
the direction is presumed to be downward. The hand 
image normalization part 11 then normalizes the 
rotated hand image in such a manner that the size and 
contrast thereof each satisfies a predetermined value, 
and then generates a hand shape image {f in FIG. 2). 
The generated hand shape image is stored in the hand 
shape image information storage part 12A together with 
shape information indicating finger extension/bending 
of the hand shape image (in FIG. 2, 3 extended fingers) 
and position information indicating which direction the 
palm thereof is facing (in FIG. 2, rear-facing). Note that, 
the position information may be represented by an 
angle with respect to the optical axis. The hand image 
normalization part 1 1 subjects such normalization 
processing respectively to a plurality of hand images 
varied in hand shape and position, and then as shown in 
FIG. 3, stores hand shape images generated there- 
through in the hand shape image information storage 
part 12A. At this point in time, the eigenspace projection 
coordinates are not stored in the hand shape image 
information storage part 12A. This is because, the 
eigenspace projection coordinates are later calculated 
and stored by the eigenspace projection part 15. 
[0098] Thereafter, the eigenspace calculation part 
1 3 calculates an eigenspace for the hand shape images 
stored in the hand shape image information storage 
part 12A. 

[0099] Refer to FIG. 4. The eigenspace calculation 
part 13 first calculates an average image c obtained 
from the hand shape images stored in the hand shape 
image information storage part 12A (step S1). Thereaf- 
ter, the eigenspace calculation part 1 3 deducts the aver- 
age image c from each of the hand shape images, and 
then subjects the images to raster scanning so as to 
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represent the images by one-dimensional vectors (step 
S2). Then, a matrix A having the one-dimensional vec- 
tors lined as column vectors is determined (step S3). 
The eigenspace calculation part 13 then calculates a 
covariance matrix Q from the matrix A (step S4), and 
then calculates an eigenvalue and an eigenvector of the 
covariance matrix O (step S5). Finally, the eigenspace 
calculation part 13 calculates an eigenspace where the 
calculated eigenvector (e-,, e 2 , .... e k ) corresponding to 
the separately-defined fc-piece- large eigenvalue is used 
as a basis (step S6). 

[0100] With such processing, the eigenspace calcu- 
lation part 1 3 calculates an eigenspace basis, and then 
stores a plurality of eigenvectors into the eigenvector 
storage part 14. 

[0101] Thereafter, the eigenspace projection part 
15 calculates, for every hand shape image stored in the 
hand shape image information storage part 12A, 
eigenspace projection coordinates obtained by project- 
ing the hand shape images onto the eigenspace. 
[0102] Refer to FIG. 5, The eigenspace projection 
part 15 subjects every hand shape image stored in the 
hand shape image information storage part 12A to 
raster scanning so as to represent the image by an one- 
dimensional vector. Thereafter, the one-dimensional 
vector is multiplied by the respective eigenvector stored 
in the eigenvector storage part 14 so as to obtain the 
eigenspace projection coordinates. The eigenspace 
projection part 1 5 then stores the eigenspace projection 
coordinates into the hand shape image information stor- 
age part 12A. 

[0103] This is the end of the processing carried out 
in advance in the storage part construction system 1, 
and by then, every information would have been stored 
in the hand shape image information storage part 12A 
and the eigenvector storage part 14. 
[0104] Next, it is described how the shape/position 
recognition system 2 is operated to process. 
[0105] An input hand image is provided to the hand 
image normalization part 21. The hand image normali- 
zation part 21 subjects the input hand image to normal- 
ization in a similar manner to the hand image 
normalization part 1 1 for the purpose of generating an 
input hand shape image. In a similar manner to the 
eigenspace projection part 15, the eigenspace projec- 
tion part 22 calculates eigenspace projection coordi- 
nates for the input hand shape image by using the 
eigenvectors stored in the eigenvector storage part 14. 
The hand shape image selection part 23 then calculates 
a distance (e.g., Euclidean distance) between the 
eigenspace projection coordinates of the input hand 
shape image and each of the eigenspace projection 
coordinates stored in the hand shape image information 
storage part 12A. Thereafter, the hand shape image 
selection part 23 selects one hand shape image closest 
to the input hand shape image. Thereafter, the 
shape/position output part 24 outputs shape information 
and position information on the hand shape image 



judged as being the closest to the input hand shape 
image. 

[0106] In this manner, the hand shape and position 
of the input hand image can be simultaneously deter- 
5 mined. 

[0107] In a typical hardware environment, the 
device of the first embodiment is structured by a storage 
device (e.g., ROM, RAM, hard disk) on which predeter- 
mined program data is recorded, a CPU, and input/out- 
10 put devices. FIG. 6 shows an exemplary structure of a 
hardware which implements the device of the first 
embodiment. 

[0108] In FIG. 6, a storage device 50 is exemplarily 
a hard disk, and operates as the hand shape image 

75 information storage part 12A and the eigenvector stor- 
age part 14. A CPU 51 is a central processing unit 
where other constituents are controlled in operation. A 
memory 52 temporarily stores data when the constitu- 
ents are operated. An image input device 53 is exem- 

20 plarily a video capture card, and receives input hand 
images for recognition. An input device 54 receives a 
plurality of hand shape images varied in hand shape 
and position, and shape information and position infor- 
mation on the hand shape images. An output device 55 

25 outputs data indicating any recognized hand shape and 
hand position. With such hardware structure, the device 
of the first embodiment can be realized. In such case, 
each processing carried out by the device of the first 
embodiment is provided, separately, in the form of pro- 

30 gram data. The program data may be installed via a 
recording medium such as CR-ROM and floppy disk. 
[0109] When the device of the first embodiment is 
used as an interface for other devices, the following con- 
stituents may be provided: an instruction storage part 

35 capable of storing instructions relevant to shape infor- 
mation and position information; and an instruction out- 
put part capable of outputting such instructions. The 
instruction storage part exemplarily stores, as shown in 
FIG. 7, instructions relevant to the shape information 

40 and position information to other devices. FIG. 7 shows 
an exemplary case where instructions to an audio 
device are stored. The instruction output part outputs, 
according to the shape information and position infor- 
mation outputted from the shape/position output part 

45 24, instructions corresponding thereto to other devices. 
In FIG. 7, for example, when the shape/position output 
part 24 outputs the shape information indicating "five- 
extended fingers" and the position information indicat- 
ing "every direction", the instruction output part outputs 

so an instruction to the audio device to "start". In this man- 
ner, the device of the first embodiment can be used as 
an interface for other devices. 

[0110] As is known from the above, according to the 
device and method for recognizing hand shape and 
55 position of the first embodiment, a plurality of hand 
images varied in hand shape and position and an input 
hand image are all subjected to wrist-region-cut-proc- 
ess before normalization. In this manner, the hand 
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images can be more accurately normalized than a case 
where the size and contrast thereof are simply normal- 
ized. Accordingly, the accuracy in recognition can be 
sufficiently high even if the method based on the 
eigenspace is applied to recognize hand shape and 5 
position. 

[0111] Further, by using the method based on the 
eigenspace, geometric characteristics such as the 
number of extended fingers can be recognized, 
whereby rather complicated hand shapes having little w 
geometric characteristics can be correctly recognized. 
[0112] Further, when a comparison is made 
between an input hand shape image and a plurality of 
hand shape images for matching, volume of images 
may become enormous. In the first embodiment, how- 15 
ever, hand images are projected onto an eigenspace, 
after normalization, to calculate eigenspace projection 
coordinates, and then the coordinates are compared 
with those of the input hand shape image in the 
eigenspace. In this manner, the calculation load is 20 
eased compared with a case where the comparison is 
made between images, thereby rendering the speed of 
processing increased. As is obvious from this, the 
method based on the eigenspace is very practical when 
the volume of hand shape images is supposedly enor- 25 
mous. 

[01 13] In the first embodiment, human hand images 
are supposedly stored as hand shape images varied in 
hand shape and position for recognition. The problem 
herein is, such real images cannot be taken in from 30 
some directions since a human hand cannot be put on 
a turn table, or a human being cannot be still enough to 
be in a certain posture with considerable accuracy. 
There has to be a special equipment for taking in hand 
images from every directions. To get around such prob- 35 
lem, a 3D hand model popular in CAD and CG may be 
used, and images thereof can be taken in from several 
directions. In this manner, a hand shape image can be 
defined by hand shape and position with a high degree 
of accuracy. A mannequin hand is also a possibility. In 40 
the first embodiment, both the 3D hand model image 
and real hand image can be realized under the same 
structure and method, 

[0114] Further, in the first embodiment, hand shape 
and hand position are each supposedly limited to one 45 
for output. However, due to image resolution, for exam- 
ple, some images may look identical in hand shape or 
hand position in some cases. If this is the case, the 
number of hand shapes or hand positions for output 
may be plural each as a possibility. Even if so, it can be 50 
realized under the same structure and method as the 
first embodiment. Still further, in the first embodiment, 
both the hand shape image and input hand shape 
image are supposedly contrast images. However, the 
images may be silhouette images or color images, and 55 
are realized under the same structure and method as in 
the first embodiment. 



(Second Embodiment) 

[0115] In a case where a plurality of hand images 
varied in hand shape and position are classified, the 
classification may be made according to hand shape or 
hand position. If this is the case, however, some hand 
images are not distinguishable when being analogous 
in hand position from a certain direction but different in 
hand shape, e.g., it cannot clearly tell from the side how 
many fingers are extended, or when being analogous in 
hand shape but different in hand position, e.g., it cannot 
clearly tell to which direction a fist is oriented. Therefore, 
such classification may not work out good enough for 
hand shape/position recognition. 

[0116] Accordingly, relevant to the device and the 
method for recognizing hand shape and position under 
the eigenspace method described in the first embodi- 
ment, in the second embodiment, the frequency of com- 
parison for matching can be reduced and the 
processing speed can be improved. This is imple- 
mented by, under cluster evaluation, automatically clas- 
sifying, into clusters, the eigenspace projection 
coordinates obtained for every hand shape image 
stored in the hand shape image information storage 
part 12A. Thereafter, it is decided to which cluster an 
input hand image belongs, and then is decided which 
hand shape image in the cluster is the closest to the 
input hand image. 

[01 1 7] FIG. 8 is a block diagram showing the struc- 
ture of a device for recognizing hand shape and position 
of the second embodiment. In FIG. B, the device is 
structured by, similarly to the device of the first embodi- 
ment, the storage part construction system 1 and the 
shape/position recognition system 2. 
[0118] In FIG. 8, the storage part construction sys- 
tem 1 is provided with the hand image normalization 
part 1 1 , a hand shape image information storage part 
12B, the eigenspace calculation part 13, the eigenvec- 
tor storage part 1 4, the eigenspace projection part 15, a 
cluster evaluation part 16, and a cluster information 
storage part 17A, while the shape/position recognition 
system 2 is provided with the hand image normalization 
part 21, the eigenspace projection part 22, a maximum 
likelihood cluster judgement part 25, an image compar- 
ison part 26, and the shape/position output part 24. 
[01 1 9] As shown in FIG. 8, the device of the second 
embodiment is provided, in the storage part construc- 
tion system 1 , with the hand shape image information 
storage part 12B as an alternative to the hand shape 
image information storage part 12A in the device of the 
first embodiment, and further includes the cluster evalu- 
ation part 16 and the cluster information storage part 
17A, while in the shape/position recognition system 2, 
the maximum likelihood cluster judgement part 25 and 
the image comparison part 26 are provided as alterna- 
tives to the hand shape image selection part 23. 
[0120] Other constituents in the device of the sec- 
ond embodiment are the same as those in the device of 
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the first embodiment, and are denoted by the same ref- 
erence numerals and not described again. 
[0121] First, the storage part construction system 1 
and the shape/position recognition system 2 in the sec- 
ond embodiment are structurally described, more s 
focused on the constituents differ from those in the 
device of the first embodiment 

[0122] The hand shape image information storage 
part 12B stores a plurality of hand shape images gener- 
ated by the hand Image normalization part 1 1 . Together w 
therewith, the hand shape image information storage 
part 12B also stores shape information and position 
information on the hand shape images, and eigenspace 
projection coordinates obtained by projecting the hand 
shape images onto an eigenspace. Unlike the hand is 
shape image information storage part 12A in the first 
embodiment, the hand shape image information stor- 
age part 12B stores cluster indexes (hereinafter, 
referred to as cluster IDs) obtained through clustering 
automatically carried out on the hand shape images. 20 
The cluster evaluation part 16 classifies, under cluster 
evaluation, the hand shape images stored in the hand 
shape image information storage part 12B into clusters, 
and then determines which hand shape image goes to 
which cluster. Thereafter, the cluster evaluation part 16 25 
stores cluster IDs, which identify the clusters, into the 
hand shape image information storage part 12B, and 
then obtains statistical information relevant to each clus- 
ter. The cluster information storage part 17A stores the 
cluster IDs and the statistical information obtained by 30 
the cluster evaluation part 16. 

[0123] The maximum likelihood cluster judgement 
part 25 determines a cluster including the projection 
coordinates being closest to the eigenspace projection 
coordinates calculated by the eigenspace projection 35 
coordinates 22. The image comparison part 26 then 
refers to the hand shape image information storage part 
12B for the hand shape images included in the deter- 
mined cluster, and therefrom, selects one hand shape 
image being analogous most closely to the input hand 40 
shape image generated by the hand image normaliza- 
tion part 21 . 

[0124] Next, by referring to FIGS. 9 to 11, the 
method for recognizing hand shape and position carried 
out by the device of the second embodiment is 45 
described stepwise. FIG. 9 is an exemplary storage 
table in the hand shape image information storage part 
12B in FIG. 8. FIG. 10 is a flowchart exemplarily show- 
ing the operation of the cluster evaluation part 16 in FIG. 
8. The exemplary case shown in FIG. 10 applies ISO- so 
DATA method which is a technique in cluster evaluation. 
FIG. 11 is a diagram exemplarily showing a concept of a 
comparison technique carried out by the image compar- 
ison part 26 in FIG. 8. The comparison technique shown 
in FIG. 11 exemplarily applies a simple pattern match- 55 
ing. 

[0125] First, it is described how the storage part 
construction system 1 is operated to process. 



[0126] In a similar manner to the first embodiment, 
before normalization, the hand image normalization 
part 11 deletes a wrist region each from a plurality of 
hand images varied in hand position, and then gener- 
ates hand shape images. The generated hand shape 
images are stored into the hand shape image informa- 
tion storage part 12B together with shape information 
and position information thereon, as shown in FIG. 9. At 
this point in time, the cluster IDs and the eigenspace 
projection coordinates are not yet stored in the hand 
shape image information storage part 12B. This is 
because, the cluster IDs and the eigenspace projection 
coordinates are later obtained and calculated respec- 
tively by the eigenspace projection part 15 and the clus- 
ter evaluation part 16. 

[0127] Thereafter, an eigenspace is calculated 
under the eigenspace method, in a similar manner to 
the first embodiment, by the eigenspace calculation part 
1 3, the eigenvector storage part 1 4, and the eigenspace 
projection part 15. Onto the eigenspace, the hand 
shape images stored in the hand shape image informa- 
tion storage part 12B are projected. Then, the 
eigenspace projection coordinates obtained thereby are 
stored into the hand shape image information storage 
part 12B. 

[01 28] The cluster evaluation part 1 6 performs clus- 
ter evaluation with respect to the eigenspace projection 
coordinates stored in the hand shape image information 
storage part 12B, and then classifies these hand shape 
images into clusters according to the distance therea- 
mohg. Such cluster evaluation carried out in the cluster 
evaluation part 16 is varied in manner such as a simple 
reallocation method (k-means method) or ISO DATA 
method. Herein, an exemplary clustering technique 
applying ISODATA method is described. 
[0129] ISODATA method is typical for non-hierarchi- 
cal clustering, and is a combination of clustering under 
the reallocation method and a process of cluster split- 
ting and merging. 

[0130] By referring to FIG. 1 0, the cluster evaluation 
part 16 first sets initial parameters (step S101). The ini- 
tial parameters may include the eventual number of 
clusters, a convergence condition for reallocation, a 
judgement condition for a cluster very small in number 
of hand shape images or for isolated data, a condition 
for cluster splitting and merging, or a termination condi- 
tion for repetitive calculation. Then, the cluster evalua- 
tion part 16 selects some of the clusters as a reference 
for clustering (initial clusters) (step S102). The coordi- 
nates of the initial clusters are arbitrarily determined by 
referring to the projection coordinates of each of the 
hand shape images. 

[0131] Then, the cluster evaluation part 16 per- 
forms clustering under the reallocation method. First of 
all, the cluster evaluation part 16 calculates each dis- 
tance between the eigenspace projection coordinates of 
the hand shape images and the center coordinates of 
the initial clusters in the eigenspace, and then reallo- 
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cates the hand shape images respectively into the clos- 
est cluster (step S103). The cluster evaluation part 16 
then recalculates the center coordinates of each cluster 
according to the eigenspace projection coordinates of 
the reallocated hand shape images (step S104). The s 
cluster evaluation part 16 judges whether or not the 
number of reallocated hand shape images is a predeter- 
mined threshold value or smaller (clustering is con- 
verged) (step S105). If the number of reallocated 
images is judged as being the threshold value or 10 
smaller in step S1 05, the cluster evaluation part 1 6 ter- 
minates the clustering processing under the realloca- 
tion method. If not, the procedure returns to step S103 
to repeat the procedure. 

[0132] If the clustering is judged as being con- 75 
verged in step S1 05, the cluster evaluation part 1 6 then 
removes any cluster very small in number of hand 
shape images and any hand shape image being appar- 
ently isolated from others (step S106). Next, the cluster 
evaluation part 16 determines whether or not the cur- 20 
rent number of clusters falls within a predetermined 
range of the eventual number of clusters, and whether 
or not the minimum value for a distance between the 
center coordinates of each clusters is a predetermined 
threshold value or smaller (step Si 07). If the minimum 25 
value for the distance is judged as being the threshold 
value or smaller in step S1 07, the cluster evaluation part 
16 determines that the clustering is converged, and 
then stores information on each cluster (e.g., statistical 
information such as cluster IDs, an average value of the 30 
coordinates in the eigenspace, or distribution) into the 
cluster information storage part 1 7A, and the cluster IDs 
indicating which hand shape image belongs to which 
cluster into the hand shape image information storage 
part 12B (step S108). If the minimum value for the dis- 35 
tance is not judged as being the threshold value or 
smaller in step S107, on the other hand, the cluster 
evaluation part 1 6 carries out cluster splitting or merging 
(step S109). In step S109, when the current number of 
clusters is too large to fall within the predetermined 40 
range, the cluster evaluation part 16 carries out cluster 
splitting, and when the number is too small to fall within 
the predetermined range, the cluster evaluation part 16 
carries out cluster merging. When the number falls 
within the range, the cluster evaluation part 16 carries 45 
out cluster merging or splitting depending on how many 
times the processing was repeated; an even number for 
merging and an odd number for splitting. 
[0133] In cluster merging, with a condition that the 
minimum distance is the threshold value or smaller, the so 
cluster evaluation part 16 merges the two clusters hav- 
ing the minimum distance therebetween into one, and 
finds a new center coordinates thereof. Then, the clus- 
ter evaluation part 1 6 again calculates the distance, and 
keeps performing cluster merging until the minimum 55 
distance is equalized to the threshold value or larger. 
[0134] In cluster splitting, when a maximum value 
for distribution in one cluster is a predetermined thresh- 



old value or larger, the cluster evaluation part 16 splits 
the cluster into two according to a first component, and 
then calculates a new center coordinates and a distribu- 
tion value for each of the split clusters. The cluster split- 
ting is repeated until the maximum value for distribution 
becomes the threshold value or smaller. 
[01 35] After the cluster splitting and merging in step 
S1 09 being through, the procedure returns to step S1 03 
for the same processing. 

[0136] With such processing, the cluster evaluation 
is completed, and the cluster information storage part 
17A is stored with the information on the respective 
clusters such as the statistical information including the 
cluster IDs, the average value of the coordinates in the 
eigenspace, or distribution. Also the hand shape image 
information storage part 12B is stored with the cluster 
IDs indicating which hand shape image belongs to 
which cluster. Such parameters may be optimally and 
appropriately selected by experiment, for example, or it 
may possible to designate the eventual number of clus- 
ters and a criterion for cluster splitting and merging 
according to a certain criterion for information content 
(e.g., AIC, MDL). Although the cluster evaluation under 
ISODATA is described in this embodiment, cluster eval- 
uation under the simple reallocation method may carry 
the same effects as ISODATA method by properly set- 
ting parameters such as threshold value. 
[0137] This is the end of the processing carried out 
in advance in the storage part construction system 1 , 
and by then, every information would have been stored 
in the hand shape image information storage part 12B, 
the eigenvector storage part 14, and the cluster infor- 
mation storage part 17A. 

[0138] Next, it is described how the shape/position 
recognition system 2 is operated to process. 
[0139] An input hand image for recognition is pro- 
vided to the hand image normalization part 21 . In a sim- 
ilar manner to the first embodiment, the input hand 
image is normalized by the hand image normalization 
part 21 and then is represented by the eigenspace pro- 
jection coordinates by the eigenspace projection part 
22. The maximum likelihood cluster judgement part 25 
calculates a distance between the eigenspace projec- 
tion coordinates and each of coordinates of the cluster 
information stored in the cluster information storage 
part 17A, and then determines which cluster includes 
the hand shape image located closest to the input hand 
shape image. A method therefor may include a tech- 
nique applying Euclidean distance for the dusters, a 
technique applying Maharanobis distance for the clus- 
ters, or a technique for determining likelihood for every 
cluster under maximum likelihood method, and then 
finding one cluster whose likelihood is the highest. 
Herein, the technique for finding the closest cluster 
under the maximum likelihood method is exemplarliy 
described. 

[0140] First of all, the maximum likelihood cluster 
judgement part 25 finds, as the statistical information on 
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the clusters, an average u from the eigenspace projec- 
tion coordinates u of the images included in the clusters 
stored in the cluster information storage part 17A, and 
regards the average u. as cluster center coordinates. 
Further, the cluster maximum likelihood cluster judge- 
ment part 25 obtains a covariance matrix £ from the 
eigenspace projection coordinates u of the respective 
images and the cluster center coordinates. With these 
values, a likelihood function Gj (u) relevant to a cluster / 
can be defined by the following equation 4, where x 2 
indicates a Maharanobis distance between the 
eigenspace projection coordinates u and the cluster /. 

G,(u)=-1 in l£il - 1 X 2 (u ; uj, (4) 



[0141] With this maximum likelihood function G\{u), 
the cluster having the highest likelihood is found. 
[0142] Note that, if the registered hand shape are 
small in number, the above described technique (the 
technique applying Euclidean distance or Maharanobis 
distance) can be similarly effective. 
[0143] Then, the image comparison part 26 refers 
to the cluster IDs stored in the hand shape image infor- 
mation storage part 12B so as to find one hand shape 
image analogous most closely to the input hand shape 
image. This is done by comparing the hand shape 
images included only in the cluster selected by the max- 
imum likelihood cluster judgement part 25 with the input 
hand shape image generated by the hand image nor- 
malization part 21. Herein, although the comparison 
can be done in various manners, a simple pattern 
matching will do. Thereafter, the shape/position output 
part 24 outputs shape information and position informa- 
tion on the hand shape image selected by the image 
comparison part 26. 

[0144] As is known from the above, according to the 
device and the method for recognizing hand shape and 
position of the second embodiment, first in the storage 
part construction system 1 , a plurality of hand shape 
images stored in the hand shape image information 
storage part 12B are classified, in an eigenspace, into 
clusters under cluster evaluation. And in the shape/posi- 
tion recognition system 2, it is first decided to which 
cluster an input hand image belongs, and to which hand 
shape image in the cluster the input hand image is most 
analogous. In this manner, the frequency of image com- 
parison is reduced, and accordingly higher-speed 
processing can be achieved. 

[0145] Further, the images are not classified 
according to hand shape or hand position, but accord- 
ing to the distance in the eigenspace, in other words, 
according to analogousness in image. In this manner, it 
is possible to accurately define each image by hand 
shape and position even if the images are analogous in 
hand position from a certain direction but different in 
hand shape. 



[0146] In the second embodiment, human hand 
images are supposedly stored as hand shape images 
varied in hand shape and position. However, as in the 
first embodiment, a 3D hand model popular in CAD and 

5 CG may be used, and images thereof can be taken in 
from several directions. In this manner, the images can 
be defined by hand position with a high degree of accu- 
racy. A mannequin hand is also a possibility. 
[0147] Further, in the second embodiment, hand 

w shape and hand position are each supposedly limited to 
one for output. However, due to image resolution, for 
example, some images may look not-distinguishably 
identical in hand shape or hand position in some cases. 
If this is the case, the number of hand shapes or hand 

15 positions for output may be plural each as a possibility. 
Even if so, it can be realized under the same structure 
and method as the second embodiment. Still further, in 
the second embodiment, the image comparison part 26 
is utilized to group the images analogous to one 

20 another. However, for some cases, the images are 
enough to be defined only by hand shape. If this is the 
case, for every hand shape in the cluster selected by the 
maximum likelihood cluster judgement part 25, a statis- 
tic of the average image or distributed image is referred 

25 to for determining the hand shape images. Then, the 
input hand image is compared therewith. In order to 
implement the device of the second embodiment with a 
hardware, the device can be structured similarly to the 
one shown in FIG. 6. 

30 [0148] Still further, the image comparison part 26 in 
the device of the second embodiment may be differently 
structured for comparing the hand shape images 
included in the cluster selected by the maximum likeli- 
hood cluster judgement part 25 with the input hand 

35 shape image generated by the hand image normaliza- 
tion part 21. In detail, as alternatives to the image com- 
parison part 26, it is possible to provide an identical 
shape classification part for classifying the hand shape 
images included in the same cluster according to hand 

40 shape; a shape group statistic calculation part for calcu- 
lating a statistic representing each classified cluster; 
and a maximum likelihood shape judgement part for cal- 
culating a distance between the input hand shape 
image and the statistic calculated by the shape group 

45 statistic calculation part, and then outputting one hand 
shape included in the closest cluster. With such struc- 
ture, the frequency of comparison for matching is 
decreased to a greater degree, and accordingly still 
higher-speed processing can be achieved. 

50 

(Third Embodiment) 

[0149] As is described in the second embodiment, 
after cluster evaluation, each cluster includes images 
55 analogous to one another. Therefore, as shown in FIG. 
13, a hand image of two fingers abutting to each other 
and a hand image of one finger overlying another are 
classified into the same cluster. In sign language, how- 
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ever, such difference in hand shape needs to be dis- 
criminated. For such discrimination, unlike such manner 
in the second embodiment that an image is discrimi- 
nated in its entirety, the image needs to be partially dis- 
criminated only for a differed part. 
[0150] Accordingly, instead of extensively compar- 
ing the input hand shape image with the hand shape 
images by pattern matching, in a third embodiment of 
the present invention, a discrimination frame is provided 
for every cluster in advance, and then hand shape is 
discriminated within the frame, 

[0151] FIG. 12 is a block diagram showing the 
structure of a device for recognizing hand shape and 
position of the third embodiment. In FIG. 12, the device 
of the third embodiment is structured, similarly to the 
device of the second embodiment, by the storage part 
construction system 1 and the shape/position recogni- 
tion system 2, 

[01 52] In FIG. 12, the storage part construction sys- 
tem 1 is provided with the hand image normalization 
part 11, the hand shape image information storage part 
12B, the eigenspace calculation part 13, the eigenvec- 
tor storage part 14, the eigenspace projection part 15, a 
. cluster evaluation/frame discrimination part 18, and a 
cluster information storage part 17B, while the 
shape/position recognition system 2 is provided with the 
hand image normalization part 21, the eigenspace pro- 
jection part 22, the maximum likelihood cluster judge- 
ment part 25, an image comparison part 27, and the 
shape/position output part 24. 

[0153] As shown in FIG. 12, the device of the third 
embodiment is, in the storage part construction system 
1, provided with the cluster evaluation/frame discrimina- 
tion part 1 8 and the cluster information storage part 1 7B 
as alternatives to the cluster evaluation part 16 and the 
cluster information storage part 1 7 A, respectively, in the 
device of the second embodiment. Similarly, in the 
shape/position recognition system 2, the image com- 
parison part 27 is provided as an alternative to the 
image comparison part 26. 

[0154] Other constituents in the device of the third 
embodiment are similar to those in the device of the 
second embodiment, and are denoted by the same ref- 
erence numerals and not described again. 
[0155] By referring to FIGS. 12 and 14, the storage 
part construction system 1 and the shape/position rec- 
ognition system 2 in the third embodiment are structur- 
ally and operationally described, more focused on the 
constituents differ from those in the device of the sec- 
ond embodiment. FIG. 14 is a diagram exemplarily 
showing how the location of the shape discrimination 
frames is calculated by the cluster evaluation/frame dis- 
crimination part 18 in FIG. 12. 

[0156] The cluster evaluation/frame discrimination 
part 18 first performs cluster evaluation for the 
eigenspace projection coordinates stored in the hand 
shape image information storage part 12B, and then 
classifies the hand shape images according to the 



closeness in distance. This is carried out in a similar 
manner to the cluster evaluation part 16 in the first 
embodiment. 

[0157] Next, the cluster evaluation/frame discrimi- 

5 nation part 18 calculates the location of the shape dis- 
crimination frame for every cluster. Refer to FIG. 14. 
From each cluster, the cluster evaluation/frame discrim- 
ination part 18 plurally extracts the hand shape images 
being identical in hand shape for averaging, and accord- 

io ingly an average image for the hand shape is obtained 
therefrom. Thereafter, the cluster evaluation/frame dis- 
crimination part 18 moves a predetermined frame fixed 
in form (the form thereof is arbitrary, exemplarily square 
in FIG. 14) on each average image, and finds a differ- 

15 ence among images in the frames. The shape discrimi- 
nation frame is then each set on the part showing the 
largest difference. Then, the cluster evaluation/frame 
discrimination part 18 stores the location of the shape 
discrimination frame into the cluster information storage 

20 part 1 7B. 

[0158] The image comparison part 27 first refers to 
the cluster IDs stored in the hand shape image informa- 
tion storage part 12B, and obtains the hand shape 
images included in the cluster selected by the maximum 

25 likelihood cluster judgement part 25 and the input hand 
shape image generated by the hand image normaliza- 
tion part 21 . The image comparison part 27 also obtains 
the location of the shape discrimination frame for the 
cluster selected by the maximum likelihood cluster 

30 judgement part 25 from the cluster information storage 
part 17B. The image comparison part 27 then com- 
pares the obtained hand shape images and the input 
hand shape image only for the parts within the shape 
discrimination frame, and determines which hand 

35 shape image is analogous most closely to the input 
hand shape image. 

[01 59] As is known from the above, according to the 
device and the method for recognizing hand shape and 
position of the third embodiment, the location of the 

40 shape discrimination frame is predetermined, and then 
the comparison for matching between the hand shape 
images and the input hand shape image is done for the 
parts within the frame. In this manner, the comparison 
for matching can be less frequent than the second 

45 embodiment, and accordingly still higher-speed 
processing can be achieved with a higher degree of 
accuracy. 

(Fourth Embodiment) 

50 

[0160] In the second embodiment, the image com- 
parison part 26 directly compares the input hand shape 
image with the hand shape images in the cluster 
selected by the maximum likelihood cluster judgement 
55 part 25 for the purpose of defining the input hand image 
by hand shape and position. In a fourth embodiment of 
the present invention, instead, the hand shape and the 
hand position are determined by first photographing a 



22 



43 



EP 1 059 608 A2 



44 



hand in a certain shape and position from several direc- 
tions with a plurality of cameras, secondly by determin- 
ing each obtained image to the appropriate cluster by 
the maximum likelihood cluster judgement part 25, and 
finally by combining the relevant shape information and 5 
the position information in each of the clusters. 
[0161 ] Since the device of the fourth embodiment is 
structurally similar to the device in the second embodi- 
ment, no drawing is provided therefor. The shape/posi- 
tion recognition system 2 in the fourth embodiment is 10 
structurally and operationally described, more focused 
on the constituents differ from those in the device of the 
second embodiment by referring to FIGS. 8 and 15. 
FIG. 15 shows an exemplary concept of determining, in 
the device of the fourth embodiment, one hand shape 15 
image from input hand images obtained from several 
cameras. FIG. 15 exemplarily shows a case where 
three cameras are used. 

[0162] First of all, as shown in FIG. 15, a hand in a 
certain shape and position is supposedly photographed 20 
from three different directions by using three cameras. 
Accordingly, three input hand images are acquired. 
These three input hand images are processed respec- 
tively in the hand image normalization part 21, the 
eigenspace photographing part 22, and the maximum 25 
likelihood cluster judgement part 25, and then are each 
determined to the appropriate clusters. Thereafter, by 
referring to these three clusters for shape information 
and position information included therein, the image 
comparison part 26 finds one hand shape image analo- 30 
gous most closely to these three input hand shape 
images under the following conditions (1) and (2). 

(1) Be identical in hand shape. 

(2) Hand position should correspond to camera 35 
location. 

[0163] In detail, according to the condition (1), the 
image comparison part 26 first extracts any hand shape 
found in all of the clusters (in the example shown in FIG. 40 
15, one extended finger). Then, according to the condi- 
tion (2), the image comparison part 26 finds the hand 
position corresponding both to the extracted hand 
shape and camera location for the respective input hand 
images, and eventually one hand shape image is deter- 45 
mined. In the example shown in FIG. 15, if an image of 
the back of the hand is selected for the first camera, for 
consistency, an image of the down -facing palm and an 
image of front-facing hand are supposedly selected for 
the second and the third cameras, respectively. so 
[0164] In this manner, one hand shape image which 
most sufficiently satisfies the conditions is selected for 
the input hand images obtained from the cameras, and 
accordingly the input hand images can be defined by 
hand shape and position. 55 
[0165] As is known from the above, according to the 
device and method for recognizing hand shape and 
position of the fourth embodiment, input hand images 



obtained from a plurality of cameras can be defined by 
hand shape and position by combining, with considera- 
tion for camera location, shape information and position 
information in the clusters determined for each of the 
input hand images. In this manner, even a hand image 
which has been difficult to recognize from one direction 
(e.g., a hand image from the side) can be defined by 
hand shape and position with accuracy. 
[0166] Note that, in the fourth embodiment, the 
shape information and the position information relevant 
to the input hand images from every camera is com- 
bined with consistency. However, it may not always nec- 
essary to apply information from every camera for most- 
highly possible hand shape and position. Further, 
although three cameras are exemplarily used in the 
fourth embodiment, the number of cameras is not 
restrictive thereto. 

(Fifth Embodiment) 

[0167] The input hand image for recognition in the 
second embodiment is supposed to be a static image 
(e.g., indicating "one" by extending an index finger). 
However, in gesture and sign language, the hand may 
be successively moved to indicate one meaning, and 
the input hand image therefor is supposed to be a time- 
varying image (e.g., to show directions, extending one's 
arm and then pointing the direction with his/her index 
finger). As to such hand movement being time-varying 
image, the device of the second embodiment is not 
capable enough of recognizing what the hand move- 
ment means. 

[0168] Therefore, a device for recognizing hand 
shape and position of a fifth embodiment is a type cor- 
responding to a case where the input hand image is 
such time-varying image obtained by photographing a 
hand successively moved to indicate a meaning (here- 
inafter, hand movement image). The device of the fifth 
embodiment provides a method for catching the mean- 
ing of such hand movement. Therein, featured points 
are first extracted respectively for various hand move- 
ments, and are stored together with meanings thereof. 
Thereafter, featured points of the input hand movement 
image are compared with the already-stored featured 
points so as to find the meaning. 
[0169] In the fifth embodiment, the input hand 
movement image supposedly includes not only a 
signer's hand but his/her upper or whole-body. The 
signer may be photographed from various directions, 
e.g., from the front, obliquely-above, or the side, and the 
device of the fifth embodiment can be effective for 
images photographed from any direction. 
[0170] FIG. 16 is a block diagram showing the 
structure of the device for recognizing hand shape and 
position of the fifth embodiment. In FIG. 16, the device 
of the fifth embodiment is structured, similarly to the 
device of the second embodiment, by the storage part 
construction system 1 and the shape/position recogni- 
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tion system 2. 

[0171] In FIG. 16, the storage part construction sys- 
tem 1 is provided with the hand image normalization 
part 11, the eigenvector storage part 14, the 
eigenspace calculation part 13, the hand shape image 5 
information storage part 12B, the eigenspace projection 
part 15, the cluster information storage part 17A, and 
the cluster evaluation part 16, while the shape/position 
recognition system 2 is provided with a hand region 
detection part 28, a hand movement segmentation part w 
29, a hand image cutting part 30, the hand image nor- 
malization part 21, the eigenspace projection part 22, 
the maximum likelihood cluster Judgement part 25, an 
identification operation part 33 A, a series registration 
part 31 , a series identification dictionary 32, and a data 75 
passage control part 34A. 

[0172] As shown in FIG. 16, the device of the fifth 
embodiment is provided with, in the shape/position rec- 
ognition system 2, the hand region detection part 28, 
the hand movement segmentation part 29, and the 20 
hand image cutting part 30 in the preceding stage to the 
hand image normalization part 21 , and the series regis- 
tration part 31 , the series identification dictionary 32, 
the identification operation part 33A, and the data pas- 
sage control part 34A as alternatives to the image com- 25 
parison part 26 in the device of the second 
embodiment. 

[0173] Other constituents in the device of the fifth 
embodiment are the same as those in the device of the 
second embodiment, and are denoted by the same ref- 30 
erence numerals and not described again. 
[0174] Herein, the storage part construction system 
1 in the fifth embodiment is the storage part construc- 
tion system 1 found in the device of the second embod- 
iment, and is structured without the series identification 35 
dictionary 32. Note that, the constituents under the 
name of "storage part construction system 1" and 
"shape/position recognition system. 2" in the fifth 
embodiment are provided only to show relevancy to the 
second embodiment, and are not restrictive to create 40 
the dictionary (series identification dictionary 32) in the 
actual internal processing, exemplarily in the recogni- 
tion/shape recognition system 2. 

[0175] The shape/position recognition system 2 in 
the fifth embodiment is structurally described, more 45 
focused on the constituents differ from those in the 
device of the second embodiment. 
[0176] The hand region detection part 28 receives 
the hand movement image, and detects a hand region 
respectively therefrom. The hand movement segmenta- 50 
tion part 29 finds any change point in hand shape and 
hand position for the hand movement image, and then 
creates a hand movement image structured by one or 
more images including the change points. From the 
hand movement image, the hand image cutting part 30 55 
cuts any peripheral region where the hand is observed, 
and creates a hand image series for output to the hand 
image normalization part 21 . The maximum likelihood 



cluster judgement part 25 outputs a cluster series corre- 
sponding to the hand image series to the series regis- 
tration part 31. The series registration part 31 then 
registers, in the series identification dictionary 32, the 
cluster series together with a meaning of the hand 
movement image (hand movement image). The identifi- 
cation operation part 33 A identifies what the hand 
movement image means by comparing the cluster 
series provided by the maximum likelihood cluster 
judgement part 25 with the cluster series registered in 
the series identification dictionary 32. The data passage 
control part 34A so controls that, for registration, the 
cluster series from the maximum likelihood cluster 
judgement part 25 is forwarded to the series registration 
part 31, and for recognition, to the identification opera- 
tion part 33A. 

[0177] Next, by referring to FIGS. 17 to 20, the 
method for recognizing hand shape and position carried 
out by the device of the fifth embodiment is operationally 
described stepwise. FIG. 17 shows a concept of the 
processing carried out by the hand region detection part 
28, the hand movement segmentation part 29, and the 
hand image cutting part 30 in FIG. 16. FIG. 18 shows 
the hand image series in FIG. 16 and an exemplary 
cluster series obtained, therefrom. FIGS. 19 and 20 
each show an exemplary storage format of the series 
identification dictionary 32 in FIG. 16. FIG. 19 shows an 
exemplary simple storage format in a table, and FIG. 20 
shows an exemplary storage format based on a hidden 
Markov model. 

[0178] In the fifth embodiment, the storage part 
construction system 1 operates similarly to the one in 
the device of the second embodiment, and is not 
described again. 

[0179] The shape/position recognition system 2 
operates in the following two modes: 

1 . Registration mode (first registration mode) 

[0180] This is a mode of registering, in the series 
identification dictionary 32, the cluster series obtained 
from the input hand movement image together with the 
meaning of the input hand movement image. 

2. Identification mode 

[0181] This is a mode of identifying what the input 
hand movement image means according to the cluster 
series obtained therefrom. The identification mode is 
equivalent to the hand shape/position recognition car- 
ried out in the second embodiment, and the meaning of 
the hand movement is recognized through the eigen- 
vector storage part 14, the cluster information storage 
part 17A, and the series identification dictionary 32. 
[01 82] As to mode selection, the data passage con- 
trol part 34A is controlled which mode to select. Each 
mode is operationally described stepwise next below. 
[0183] First, it is described how the hand region 
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detection part 28, the hand movement segmentation 
part 29, and the hand image cutting part 30 operate. 
Such operation is common to both mode. 
[0184] The hand movement image (a in FIG. 17) is 
provided to the hand region detection part 28. The hand 
region detection part 28 then detects a region where the 
hand is observed (hand region) for every image. Sup- 
posedly, the hand region is easily isolated from the 
background. Therefore, herein, each image is simply 
converted in binary so that a region whose area size is 
closer to the region where the hand is observed is 
detected as a hand region. 

[0185] The hand movement segmentation part 29 
finds, for the hand movement image provided from the 
hand region detection part 28, an image(s) vital for hand 
shape and position (hereinafter, key frame). The key 
frame herein should be an image in which hand shape 
and position is perceivable for a human being. The 
human being generally cannot perceive hand shape 
and position of the hand movement image due to after- 
image. By taking this into consideration, the key frame 
to be found in the hand movement segmentation part 29 
should be relatively small in hand movement. In this 
manner, the hand movement segmentation part 29 finds 
one or more key frames, and the key frames are for- 
warded to the hand image cutting part 30 as the hand 
movement image (b in FIG. 17). 

[0186] In order to find the frame relatively small in 
hand movement, the hand regions of the respective 
hand images structuring the hand movement image 
may be referred to for any difference in area or change 
thereamong, the location of the hand in the hand region 
may be traced so as to find a point where the hand is 
relatively still (any frame in which a curvature of the 
hand trace is relatively large is included), or the point 
where the hand is relatively, still is found by utilizing 
information on a time difference image obtained from 
the hand movement image. Or, every image included in 
the hand movement image may be the key frame. 
[0187] Respectively from the key frame (s) found for 
the hand movement image by the hand movement seg- 
mentation part 29, the hand image cutting part 30 cuts 
the hand region detected by the hand region detection 
part 28, and then creates the hand image series (c in 
FIG. 17 and a in FIG. 18). The images structuring the 
hand image series are each equivalent to the hand 
images in the second embodiment. The hand image 
series created by the hand image cutting part 30 is for- 
warded to the hand image normalization part 21 . 
[0188] Thereafter, in a similar manner to the second 
embodiment, the hand image normalization part 21 , the 
eigenspace projection part 22, and the maximum likeli- 
hood cluster judgement part 25 each perform process- 
ing to the key frame(s) of the hand image series. 
Accordingly, the key frame (s) are each determined to 
the appropriate clusters, and are outputted as the clus- 
ter series (b in FIG. 18). 

[0189] Such processing is common to the both 



modes and is carried out to find the cluster series corre- 
sponding to the hand movement image. 
[01 90] Next, the processing not common to the both 
modes is described. 
5 [0191] First of all, the processing in the registration 
mode is described. 

[0192] In the registration mode, the cluster series to 
be outputted from the maximum likelihood cluster 
judgement part 25 is defined as to be a series which 

10 characterizes the hand movement, and is registered 
(stored) in the identification dictionary 32 together with 
the meaning of the hand movement. 
[0193] Also the data passage control part 34A so 
controls that the cluster series provided by the maxi- 

15 mum likelihood cluster judgement part 25 is forwarded 
to the series registration part 31 . 
[0194] The series registration part 31 then regis- 
ters, in the series identification dictionary 32, the cluster 
series together with the meaning of the hand movement 

20 corresponding thereto. Although the storage format of 
the series identification dictionary 32 is varied in type, 
the two storage formats shown in FIGS. 19 and 20 are 
exemplanly adapted for description. 
[0195] FIG. 19 shows a storage format in which the 

25 cluster series obtained by the maximum likelihood clus- 
ter judgement part 25 is registered, as it is, together with 
the meaning of the hand movement. As is known from 
FIG. 19, one meaning is not limited by one cluster 
series. This is because, the hand movement may 

30 slightly vary in speed and shape depending on who the 
signer is. The storage format of this type is created after 
the registration processing is repeated for several times 
for one hand movement. 

[0196] FIG. 20 shows a storage format based on 

35 the hidden Markov model (HMM) exemplarily as a state 
transition model. The hidden Markov model is a tech- 
nique popular for speech recognition, in which, as 
shown in FIG. 19, the cluster series plurally applicable 
to one meaning is integrally represented in one state 

40 transition model. The technical details of the hidden 
Markov model are found, for example, in a technical 
document, Nakagawa, "Speech Recognition by Estab- 
lished Model" published by Korona sha (phonetically 
written) and edited by The Electronic Information Com- 

45 munications Society. The storage format in FIG. 20 is 
created according to the document. Note that, in FIG. 
20, scalar values each indicate a probability of the state 
transition for S1 to S3, while vector values each indicate 
a probability of output under conditions of the state tran- 

50 sition in the clusters 1 to 5. 

[0197] To create the series identification dictionary 
32, it is popular to register the hand shape and position 
obtained from the images without any change. If this is 
the case, however, as described in the second embodi- 

55 ment, some hand images are not distinguishable when 
being analogous in hand position from a certain direc- 
tion but different in hand shape, or when being analo- 
gous in hand shape but different in hand position. 
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Therefore, for correct recognition, the images need to 
be compared with one another for matching as in the 
third embodiment or to be obtained from several cam- 
eras as in the fourth embodiment. 
[0198] Therefore, instead, in the fifth embodiment, 
the cluster series is structured by the clusters, images in 
which are classified according to analogousness, and 
such cluster series is registered in the series identifica- 
tion dictionary 32. In this manner, the hand shape and 
position recognition can be done with higher accuracy. 
[0199] Next, the processing in the identification 
mode is described. 

[0200] In the identification mode, the series identifi- 
cation dictionary 32 is used to catch the meaning of the 
input hand movement image. 

[0201] In this mode, the data passage control part 
34A so controls that the cluster series provided by the 
maximum likelihood cluster judgement part 25 is for- 
warded to the identification operation part 33A. 
[0202] The identification operation part 33 A com- 
pares the cluster series provided by the maximum likeli- 
hood cluster judgement part 25 with the several cluster 
series registered in the series identification dictionary 
32, and determines which registered cluster series is 
identical or similar thereto. Thereafter, the identification 
operation part 33A extracts the meaning corresponding 
to the determined cluster series from the series identifi- 
cation dictionary 32 for output. 

[0203] As is known from the above, according to the 
device for recognizing hand shape and position of the 
fifth embodiment, before using cluster information simi- 
lar to the one in the second embodiment, the meaning 
of the hand movement successively made to carry a 
meaning in gesture or sign language is previously 
stored together with the cluster series. Thereafter, at the 
time of recognizing the hand movement image, the clus- 
ter series is referred to for outputting the stored mean- 
ing. 

[0204] In this manner, the hand movement succes- 
sively made to carry the meaning in gesture or sign lan- 
guage can be recognized with higher accuracy, and 
accordingly can be correctly caught in meaning. 
[0205] Note that, although the method for recogniz- 
ing hand images by using the key frame(s) is described 
in the fifth embodiment, it is not restrictive but can be 
effectively applied to a case where every image is 
regarded as the key frame, a case where images sam- 
pled at a constant interval are regarded as the key 
frames, or a case where images only at the start and the 
end of the hand movement are regarded as the key 
frames. 

(Sixth Embodiment) 

[0206] In a sixth embodiment of the present inven- 
tion, in the storage part construction system 1 in the fifth 
embodiment, the hand image series obtained from the 
hand movement image is stored with the meaning 



thereof instead of storing the hand images varied in 
hand shape and position in the hand shape image infor- 
mation storage part 1 2B. 

[0207] FIG. 21 is a block diagram showing the 

5 structure of the device for recognizing hand shape and 
position of the sixth embodiment. As is known from FIG. 
21, unlike the device of the fifth embodiment where the 
storage part construction system 1 and the shape/posi- 
tion recognition system 2 are separately provided, the 

w device of the sixth embodiment is one integrated unit. 
[0208] In FIG. 21 , the device of the sixth embodi- 
ment is provided with the hand region detection part 28, 
the hand movement segmentation part 29, the hand 
image cutting part 30, the hand image normalization 

is part 21, the eigenspace projection part 22, the maxi- 
mum likelihood cluster judgement part 25, the identifica- 
tion operation part 33A, the series identification 
dictionary 32, a data passage control part 34B, a hand 
image registration part 35, a series reconstruction part 

20 36, the eigenspace calculation part 1 3, the eigenvector 
storage part 14, a hand shape image information stor- 
age part 12C, the cluster evaluation part 16, and the 
cluster information storage part 1 7A. 
[0209] As shown in FIG. 21 , unlike the device of the 

25 fifth embodiment, the device of the sixth embodiment is 
one unit where the storage part construction system 1 
and the shape/position recognition system 2 are inte- 
grated. Therefore, the hand image normalization 1 1 and 
the hand image normalization part 21, and the 

30 eigenspace projection part 15 and the eigenspace pro- 
jection part 22 are respectively integrated therein, and 
the hand shape image information storage part 12C is 
provided as an alternative to the hand shape image 
information storage part 12B, the data passage control 

35 part 34B as the data passage control part 34A, and the 
hand image registration part 35 and the series recon- 
struction part 36 as the series registration part 31 . 
[0210] Other constituents in the device of the sixth 
embodiment are the same as those in the device of the 

40 fifth embodiment, and are denoted by the same refer- 
ence numerals and not described again. 
[0211] First, the device of the sixth embodiment is 
structurally described, more focused on the constituents 
differ from those in the device of the fifth embodiment. 

45 [0212] The hand image registration part 35 regis- 
ters, in the hand shape image information storage part 
12C, the hand image series corresponding to the hand 
movement image provided by the hand image normali- 
zation part 21 together with the meaning of the hand 

so image series. The hand shape image information stor- 
age part 12C stores the hand shape image series (hand 
image series) corresponding to the registered hand 
movement image therein together with the meaning of 
the hand shape image series. The hand shape image 

55 information storage part 12C also stores, in a similar 
manner to the hand shape image information storage 
part 12B in the fifth embodiment, the projection coordi- 
nates obtained by projecting the hand shape images 
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onto the eigenspace and the cluster IDs. Based on the 
information stored in the hand shape image information 
storage part 12C, the series reconstruction part 36 reg- 
isters the cluster series each corresponding to the 
stored hand shape image series and the meaning 
thereof in the identification dictionary 32. The data pas- 
sage control part 34B then so controls that, for registra- 
tion, the hand image series from the hand image 
normalization part 21 is forwarded to the hand image 
registration part 35, and for recognition, to the 
eigenspace projection part 22. 

[0213] Then, by referring to FIG. 22, the method for 
recognizing hand shape and position carried out by the 
device of the sixth embodiment is operationally 
described stepwise next below. FIG. 22 is an exemplary 
storage table provided in the hand shape image infor- 
mation storage part 12C in FIG. 21 . 
[0214] The device of the sixth embodiment oper- 
ates in the following two modes: 

1 . Registration mode (second registration mode) 

[0215] This is a mode of registering, in the series 
identification dictionary 32, the cluster series obtained 
from the input hand movement image together with the 
meaning of the images. In the registration mode, infor- 
mation for storage is stored in the hand shape image 
information storage part 12C, the eigenvector storage 
part 14, and the cluster information storage part 17A, 
and the processing therein is equivalent to the storage 
part construction system 1 in the second embodiment. 
In detail, in the hand shape image information storage 
part 12C, the hand image series (hand shape image 
series) obtained from the input hand movement image 
is stored together with the meaning of the images, and 
eigenspace calculation and cluster evaluation are per- 
formed according to the stored hand shape images. 
Thereafter, the obtained cluster series and the meaning 
of the images are registered in the series identification 
dictionary 32. 

2. Identification mode: hand movement identification 

[0216] This is a mode of identifying what the input 
hand movement image means according to the cluster 
series obtained therefrom. The identification mode is 
also equivalent to the hand shape/position recognition 
carried out in the second embodiment, and the meaning 
of the hand movement is recognized through the eigen- 
vector storage part 14, the cluster information storage 
part 17A, and the series identification dictionary 32 
[021 7] As to mode selection, the data passage con- 
trol part 34B is controlled which mode to select. Each 
mode is operationally described stepwise next below. 
[021 8] First, the processing in the registration mode 
is described. 

[021 9] As described in the forgoing, the hand move- 
ment image is processed, in a similar manner to the fifth 



embodiment, in the hand region detection part 28, the 
hand movement segmentation part 29, the hand image 
cutting part 30, and the hand image normalization part 

21. The hand image series corresponding to the input 
5 hand movement image is thus outputted from the hand 

image normalization part 21. The data passage control 
part 34B so controls that the hand image is forwarded to 
the hand image registration part 35. 
[0220] Next, the hand image registration part 35 

10 stores, in the hand shape image information storage 
part 12C, the hand image series provided by the hand 
image normalization part 21 together with the meaning 
of the hand movement corresponding to the hand image 
series. FIG. 22 shows an exemplary storage table pro- 

15 vided in the hand shape image information storage part 
12C. As shown in FIG. 22, the hand shape image infor- 
mation storage part 12C is stored with, unlike the hand 
shape image information storage part 12B in the sec- 
ond embodiment being stored with the shape informa- 

20 tion and the position information, information (steps) 
such as a number of the hand image series, the mean- 
ing of the hand movement image corresponding to the 
hand image series, and an ordinal rank of the hand 
shape image included in the series. In a case where 

25 both hands are abutting to each other in an image, such 
image is registered as one hand shape image. 
[0221] With respect to the hand shape images 
stored in the hand shape image information storage 
part 12C, the eigenspace calculation part 13, the eigen- 

30 vector storage part 14, the eigenspace projection part 

22, and the cluster evaluation part 16 ail operate in a 
similar manner to the second embodiment. Thereafter, 
the eigenvector storage part 14 stores eigenvectors cor- 
responding to the hand shape images and the cluster 

35 information storage part 1 7 A stores cluster information. 
The hand shape image information storage part 12C is 
stored with the eigenspace projection coordinates and 
the cluster IDs. 

[0222] After the information storage into the hand 
40 shape image information storage part 12C is com- 
pleted, the series reconstruction part 36 registers the 
cluster series of each stored hand image series and the 
corresponding meaning to the series identification dic- 
tionary 32. 

45 [0223] Next, the operation in the identification mode 
is described. 

[0224] In the identification mode, the identification 
operation part 33A compares the cluster series pro- 
vided by the maximum likelihood cluster judgement part 

50 25 with a plurality of cluster series registered in the 
series identification dictionary 32, and then determines 
which of the registered cluster series is identical or sim- 
ilar thereto. Thereafter, the identification operation part 
33A extracts, for output, from the series identification 

55 dictionary 32, the meaning of the cluster series deter- 
mined to be identical or similar. 

[0225] As described in the foregoing, according to 
the device for recognizing hand shape and position of 
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the sixth embodiment, the images can surely be photo- 
graphed under the same environment, and accordingly 
there is no more need to newly acquire images for rec- 
ognition. Consequently, the Images can be recognized 
with higher accuracy. 5 
[0226] Further, the device of the sixth embodiment 
can be further provided with both the series registration 
part 31 and the data passage control part 34A in the 
fifth embodiment,' and the registration of the cluster 
series and the corresponding meaning in the series w 
identification dictionary 32 may be done in both the first 
and the second registration modes. 
[0227] With such structure, even if the hand shape 
image information storage part 12C is provided as a 
fixed data base, it becomes possible to register new 15 
hand movement image (updating of the series identifi- 
cation dictionary 32) in the first registration mode. 

(Seventh embodiment) 

20 

[0228] A seventh embodiment of the present inven- 
tion provides a method of catching the meaning of the 
hand movement when the input hand image is obtained 
by photographing the hand successively moving to con- 
vey a meaning in gesture or sign language. This can be 25 
implemented by using the device of the fifth or sixth 
embodiment as a module for a device for recognizing 
gesture or sign language. 

[0229] Herein, the present invention is exemplarily 
applied to recognize sign language. In sign language, 30 
the meaning is conveyed through many elements 
including the spatial location of the hand, the hand 
movement, the hand shape and position, for example. 
As to the hand shape, it may also be concerned with the 
hand shape at the start and at the end of the sign Ian- 35 
guage (right hand or left hand only, or both). FIG. 23 
exemplarily shows some sign language words 
described with such elements. In FIG. 23, for a sign lan- 
guage word having the meaning of "say" the index fin- 
ger of the right hand is pointed up and is brought to the 40 
mouth. Thereafter, the index finger is pushed forward. 
For another sign language word having the meaning of 
"like", the thumb and the index finger of the right hand 
are both extended and then are brought to the chin part. 
Thereafter, the both fingers are downwardly moved 45 
while being closed. 

[0230] Therefore, in the device of the seventh 
embodiment, the successive hand movement is 
restricted by some comprehensive characteristics, e.g., 
the spatial location and the hand movement in order to so 
recognize hand images with higher accuracy. 
[0231] FIG. 24 is a block diagram showing the 
structure of the device for recognizing hand shape and 
position according to a seventh embodiment. In FIG. 24, 
the device of the seventh embodiment is provided with 55 
the hand image registration part 35, the eigenvector 
storage part 14, the eigenspace calculation part 13, the 
hand shape image information storage part 12C, the 



cluster information storage part 17A, the cluster evalua- 
tion part 16, the series reconstruction part 36, the hand 
region detection part 28, the hand movement segmen- 
tation part 29, the hand image cutting part 30, the hand 
image normalization part 21 , the eigenspace projection 
part 22, the maximum likelihood cluster judgement part 
25, an identification operation part 33B, the series iden- 
tification dictionary 32, the data passage control part 
34B, a comprehensive movement recognition part 37, 
and a restriction condition storage part 38. 
[0232] The device of the seventh embodiment 
shown in FIG. 24 is further provided with the compre- 
hensive movement recognition part 37 and the restric- 
tion condition storage part 38 compared with the device 
of the sixth embodiment, and is provided with the iden- 
tification operation part 33 B as an alternative to the 
identification operation part 33A. 
[0233] Other constituents in the device of the sev- 
enth embodiment are the same as those in the device of 
the sixth embodiment, and are denoted by the same ref- 
erence numerals and not described again. 
[0234] First, the restriction condition storage part 38 
is prestored with restriction conditions for restricting the 
hand shape and position according to the hand move- 
ment carrying a meaning such as sign language words. 
As shown in FIG. 23, by taking the sign language word 
having the meaning of "say" as an example, the index 
finger should be pointed up at the start and the end, and 
hand position, the spatial hand location, and the hand 
movement should be as described in the foregoing. 
Herein, the sign language word "say" is signed only with 
the right hand, therefore in FIG. 23, no description is 
made for the left hand. 

[0235] The hand movement image is forwarded to 
both the comprehensive movement recognition part 37 
and the hand region detection part 28. The comprehen- 
sive movement recognition part 37 extracts, in a similar 
manner to the hand region detection part 28, the hand 
region from each of the images structuring the input 
hand movement image. Thereafter, the comprehensive 
movement recognition part 37 traces the hand move- 
ment in the hand regions and determines the hand loca- 
tion with respect to the body. The information on the 
hand trace and location is forwarded to the identification 
operation part 33B. The comprehensive movement rec- 
ognition part 37, exemplarily, traces the hand and deter- 
mines the hand location in a manner disclosed in "Hand 
Movement Recognition Device" (11-174948/99- 
1 74948) applied by the applicant of the present inven- 
tion. 

[0236] The hand movement image provided to the 
hand region detection part 28 is processed, in a similar 
manner to the sixth embodiment, in the hand movement 
segmentation part 29, the hand image cutting part 30, 
the hand image normalization part 21, the eigenspace 
projection part 22, and the maximum likelihood cluster 
judgement part 25, and then the cluster series corre- 
sponding to the hand movement image is provided from 
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the maximum likelihood cluster judgement part 25 to the 
identification operation part 33B. 

[0237] From the data stored in the restriction condi- 
tion storage part 38, the identification operation part 
33B first extracts one or more sign language/gesture 5 
word matching to the hand movement recognition result 
(information on the hand trace and location) provided by 
the comprehensive movement recognition part 37. 
Then, the identification operation part 33B compares 
the cluster series provided by the maximum likelihood 10 
cluster judgement part 25 with the several cluster series 
registered in the series identification dictionary 32, and 
determined which of the registered cluster series is 
identical or similar thereto. Thereafter, the identification 
operation part 33B extracts the meaning(s) of the deter- 15 
mined cluster series from the series identification dic- 
tionary 32. By referring to the extracted sign 
language/gesture word(s) and the meaning(s), the iden- 
tification operation part 33B outputs one meaning being 
closest to the input hand movement image. 20 
[0238] As is known from the above, according to the 
device for recognizing hand shape and position of the 
seventh embodiment, the restriction conditions relevant 
to the comprehensive hand movement are additionally 
imposed, and the hand movement image is defined by 25 
meaning. 

[0239] In this manner, the hand movement image 
can be recognized with higher accuracy. 
[0240] Note that, in the seventh embodiment, the 
comprehensive movement recognition part 37, the' 30 
restriction condition storage part 38, and the identifica- 
tion operation part 33B are provided to the device of the 
sixth embodiment. However, these constituents may be 
provided to the device of the fifth embodiment, or to a 
device being a combination of the devices of the' fifth 35 
and sixth embodiments. 

(Eighth Embodiment) 

[0241] An eighth embodiment of the present inven- 40 
tion provides a method for detecting a hand region with 
higher accuracy by utilizing the cluster information also 
to the hand region in the hand region detection part 28 
in the fifth to seventh embodiments. 

[0242] FIG. 25 is a block diagram showing the 45 
detailed structure of hand region detection part pro- 
vided in the device of the eighth embodiment. In FIG. 
25, a hand region detection part 48 in the eighth embod- 
iment is provided with a possible region cutting part 39, 
a masking region storage part 40, an image normaliza- so 
tion part 41 , the eigenspace projection part 22, the max- 
imum likelihood cluster judgement part 25, and a region 
determination part 42. 

[0243] Other constituents in the device of the eighth 
embodiment are the same as those in the devices of the 55 
fifth to seventh embodiments, and are denoted by the 
same reference numerals and not described again. 
[0244] First, the hand region detection part 48 in the 



device of the eighth embodiment is structurally 
described. 

[0245] The possible region cutting part 39 cuts, 
from each of the images structuring input hand move- 
ment image, a region where is a possibility for a hand 
region. The possible region cutting part 39 then for- 
wards, to the region determination part 42, information 
about the location of the possible hand regions. The 
masking region storage part 40 stores a masking region 
used to extract only a predetermined region from each 
of the possible hand regions cut by the possible region 
cutting part 39. The image normalization part 41 nor- 
malizes, in size, the possible hand regions cut by the 
possible region cutting part 39, and thereon, superim- 
poses the masking region stored in the masking region 
storage part 40 for normalization in contrast. The possi- 
ble hand region images are thus acquired. The 
eigenspace projection part 22 projects, as in the fifth to 
seventh embodiments, the possible hand region images 
onto the eigenspace. The maximum likelihood cluster 
judgement part 25 determines, as in the fifth to seventh 
embodiments, the cluster closest to each of the 
eigenspace projection coordinates obtained by the 
eigenspace projection part 22. The region determina- 
tion part 42 applies the likelihood of the clusters to every 
possible hand region image, and then outputs the loca- 
tion of the possible hand region image having the high- 
est likelihood and an index thereof. 
[0246] By referring to FIGS. 26 to 28, it is now 
described stepwise how the hand region detection part 
48 in the device of the eighth embodiment detects the 
hand region. FIG. 26 shows an exemplary technique, 
carried out by the possible region cutting part 39 in FIG. 
25, for determining a possible hand region. FIG. 26 
shows three techniques: a technique for simple scan- 
ning; a technique for cutting a possible hand region 
based on color information etc.; and a technique for cut- 
ting a hand region by estimating the current location of 
the hand region based on the preceding location 
thereof. FIG. 27 is a schematic diagram showing the 
processing carried out by the image normalization part 
41 in FIG. 25. FIG. 28 shows an exemplary masking 
region stored in the masking region storage part 40 in 
FIG. 25. 

[0247] First, the possible region cutting part 39 
obtains a possible hand region from the input hand 
movement image, and then cuts a rectangular region 
corresponding to the possible hand region. In order to 
obtain the possible hand region, it may be possible to 
apply three techniques shown in FIG. 26. 
[0248] In a first technique being the simplest, a size 
for the possible hand region is predetermined. A rectan- 
gular region is cut and scanned on the hand movement 
image, and every region obtained thereby is regarded 
as the possible hand region (a in FIG. 26). In this tech- 
nique, the size for scanning may be varied depending 
on the size of the hand on the hand movement image. 
[0249] In a second technique, by utilizing the color 
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information (e.g., beige information), only a rectangular 
region peripheral to the hand image in the color of the 
color information is cut for scanning. In this technique, 
by using the beige information, only the images periph- 
eral to the hand and face can be cut for the possible 5 
hand region (b in FIG. 26). 

[0250] In a third technique, the current location of 
the hand region is estimated based on information 
about the preceding location of the hand region (infor- 
mation fed back from the region determination part 42). 10 
Then, the estimated hand region is peripherally 
scanned to cut the possible hand region. Such tech- 
nique may be varied in manner. For example, the pre- 
ceding hand speed is added to the preceding hand 
location to estimate the current location of the hand is 
region, or a Kalman filter is used to determine the hand 
location (c in FIG. 26). 

[0251] Thereafter, as shown in FIG. 27, the image 
normalization part 41 normalizes, in size, the possible 
hand region cut by the possible region cutting part 39. 20 
Thereon, the masking region stored in the masking 
region storage part 40 is then superimposed for normal- 
ization in contrast. The reason for superimposing the 
masking region on the possible hand region is that the 
processing is carried out on such shape as palm or face 25 
which cannot be represented in a rectangular region. By 
taking this into consideration, for the masking region 
stored in the masking region storage part 40, it may be 
preferable a geometric mask (mask in a simple geomet- 
ric pattern(e.g., circular, elliptic)) as a in FIG. 28 or a 30 
mask created from learning images (mask obtained 
after OR operation subjected to a pile of learning 
images) as b in FIG. 28. 

[0252] As is known from this, the image normaliza- 
tion part 41 generates the possible hand region image 35 
by first superimposing the masking region on the possi- 
ble hand region image, and then by normalizing the 
image in contrast. 

[0253] Thereafter, in a similar manner to the fifth to 
seventh embodiments, the eigenspace projection part 40 
22 projects, under the control of the eigenvector storage 
part 14, the possible hand region images provided by 
the image normalization part 41 onto the eigenspace, 
and then calculates the projection coordinates. Then, 
the maximum likelihood cluster judgement part 25 45 
determines which projection coordinates belong to 
which cluster stored in the cluster information storage 
part 17A, and then forwards, for every possible hand 
region image, the cluster and the likelihood thereof to 
the region determination part 42. so 
[0254] Thereafter, the region determination part 42 
selects the possible hand region having the highest like- 
lihood, and then the location (provided by the possible 
region cutting part 39) and the size of the hand region is 
outputted to the hand movement segmentation part 29 55 
as a hand region detection result. 
[0255] As is described in the foregoing, according 
to the device and the method for recognizing hand 



shape and position of the eighth embodiment, the hand 
region is detected by projecting the possible hand 
region onto the eigenspace and then selecting the 
appropriate cluster. 

[0256] In this manner, the hand region and the clus- 
ter therefor can be simultaneously determined. Accord- 
ingly, the hand region can be concurrently detected with 
the hand shape/position, or with the hand movement. 
[0257] Note that, although the above-described 
technique is applied to the hand movement image in 
this embodiment, it may also be effectively applied to 
detect general any moving object from general time-var- 
ying images. 

(Ninth Embodiment) 

[0258] A ninth embodiment of the present invention 
provides a method for detecting the current hand region 
with higher accuracy in the image normalization part 41 
and the region determination part 42 in the hand region 
detection part 48 in the eighth embodiment. This is 
implemented by utilizing the cluster information at the 
preceding time. 

[0259] FIG. 29 is a block diagram showing the 
detailed structure of a hand region detection part 58 in 
the device for recognizing hand shape and position of 
the ninth embodiment. In FIG. 29, the hand region 
detection part 58 in the device of the ninth embodiment 
is provided with the possible region cutting part 39, a 
masking region storage part 45, the image normaliza- 
tion part 41 , the eigenspace projection part 22, the max- 
imum likelihood cluster judgement part 25, the region 
determination part 42, a cluster transition information 
storage part 43, and a cluster transition information reg- 
istration part 44. 

[0260] As shown in FIG. 29, the hand region detec- 
tion part 58 in the device of the ninth embodiment is fur- 
ther provided with the cluster transition information 
storage part 43 and the cluster transition information 
registration part 44 to the hand region detection part 48 
in the device of the eighth embodiment, and is provided 
with the masking region storage part 45 as an alterna- 
tive to the masking region storage part 40. 
[0261] Other constituents in the device of the ninth 
embodiment are the same as those in the device of the 
eighth embodiment, and are denoted by the same refer- 
ence numerals and not described again. 
[0262] By referring to FIGS. 30 to 31, it is now 
described stepwise how the hand region detection part 
58 in the device of the ninth embodiment detects the 
hand region. FIG. 30 is a diagram exemplarily showing 
the cluster transition information stored in the cluster 
transition information storage part 43 in FIG. 29. As 
shown in FIG. 30, the cluster transition information stor- 
age part 43 is stored with a transition level map showing 
a frequency of cluster transition. In detail, when a clus- 
ter is provided at a certain time t, the map shows the 
cluster's possible transition at the subsequent time t+1. 
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Herein, the frequency of cluster transition is referred to 
as cluster transition level. FIG. 31 shows exemplary 
masking regions stored in the masking region storage 
part 45 in FIG. 29. As shown in FIG. 31, the masking 
region 45 is provided with the masks created from the 5 
learning images for every cluster. 
[0263] First, the possible region cutting part 39 
finds, in a similar manner to the eighth embodiment, a 
possible hand region from each of the images structur- 
ing the input hand movement image, and then cuts a 10 
rectangular region corresponding thereto. 
[0264] Then, the image normalization part 41 nor- 
malizes, in size, the possible hand regions obtained by 
the possible region cutting part 39, and thereonto, the 
hand masking regions stored in the masking region 15 
storage part 45 are each superimposed for normaliza- 
tion in contrast. At this time, by referring to the cluster 
transition information storage part 43 for the cluster at 
the preceding time, the image normalization part 41 plu- 
rally selects the clusters having higher transition level, 20 
and then extracts the masks applicable thereto from the 
masking region storage part 45. Thereafter, the image 
normalization part 41 creates a new mask by piling up 
the extracted masks for OR operation, and then the cre- 
ated new mask is superimposed on each of the possible 25 
hand regions for normalization in contrast. In this man- 
ner, the possible hand region images are acquired. 
[0265] Thereafter, the eigenspace projection part 
22 projects, in a similar manner to the eighth embodi- 
ment, the possible hand region images from the image 30 
normalization part 41 onto the eigenspace to obtain the 
projection coordinates. The maximum likelihood cluster 
judgement part 25 determines the cluster closest to 
each of the eigenspace projection coordinates obtained 
by the eigenspace projection part 22, and then outputs 35 
the determined clusters and the likelihood thereof for 
each of the possible hand region images to the region 
determination part 42. 

[0266] The region determination part 42 then refers 
to the transition level map stored in the cluster transition 40 
information storage part 43 according to the clusters 
and the likelihood thereof corresponding to the possible 
hand region images from the maximum likelihood clus- 
ter judgement part 25, and among the clusters having 
the transition level higher than a certain value, selects 45 
one cluster having the highest likelihood so as to deter- 
mine the possible hand region thereof. Thereafter, the 
location (provided by the possible region cutting part 39) 
and the size of the possible hand region in the selected 
cluster are notified to the hand movement segmentation so 
part 29 as a hand region detection result. The region 
determination part 42 also notifies the selected cluster 
to the cluster transition information registration part 44. 
[0267] Based on the hand region detection result 
obtained by the region determination part 42, the cluster 55 
transition information registration part 44 operates only 
when received any request for update of the cluster 
transition information storage part 43. Such request is 



made by a user using the system or a person who con- 
structs the system. When received the request for 
update, based on both the detected cluster and the pre- 
vious cluster, the cluster transition information registra- 
tion part 44 updates the cluster transition information in 
the cluster transition information storage part 43. This 
update can be done by increasing the value found in the 
applicable location in the transition map by a certain 
value, for example. 

[0268] As is known from the above, according to the 
device and the method for recognizing hand shape and 
position of the ninth embodiment, the cluster transition 
information is utilized to determine the hand region in 
the device of the eighth embodiment. In this manner, the 
hand region can be determined with higher accuracy. 
[0269] Note that, although the above-described 
technique is applied to the hand movement image in 
this embodiment, it may be effectively applied to detect 
general any moving object from general time -varying 
images. 

(Tenth Embodiment) 

[0270] A tenth embodiment of the present invention 
provides a method for recognizing hand shape and 
position with still higher accuracy. This is implemented, 
when normalizing the hand image in the hand image 
normalization parts 11 and 21 in the first to seventh 
embodiments, not only deleting a wrist region therefrom 
but extracting the hand region based on color (beige), or 
emphasizing the characteristics of the fingers after nor- 
malization. In this manner, the hand can be photo- 
graphed with a non-artificial background, and from the 
image taken In thereby, a hand region can be deleted. 
[0271] FIG. 32 is a block diagram showing the more 
detailed structure of the hand image normalization parts 
11 and 21 provided in the device of the tenth embodi- 
ment. 

[0272] In FIG. 32, the hand image normalization 
parts 1 1 and 21 in the device of the tenth embodiment 
are respectively provided with a color distribution stor- 
age part 61, a hand region extraction part 62, a wrist 
region deletion part 63, a region displacement part 64, 
a rotation angle calculation part 65, a region rotation 
part 66, a size normalization part 67, and a finger char- 
acteristic emphasizing part 68. 

[0273] Other constituents in the device of the tenth 
embodiment are the same as those in the devices of the 
first to seventh embodiments, and are denoted by the 
same reference numerals and not described again. 
[0274] First, the hand image normalization parts 1 1 
and 21 in the device of the tenth embodiment are struc- 
turally described. 

[0275] The color distribution storage part 61 previ- 
ously stores a color distribution of a to-be-extracted 
hand region. According to the color distribution, the 
hand region extraction part 62 extracts the hand region. 
The wrist region deletion part 63 finds which direction 



31 



61 



EP 1 059 608 A2 



62 



the wrist is oriented in the extracted region, and then 
deletes a wrist region therefrom according to the wrist 
orientation. The region displacement part 64 displaces 
the hand region from which the wrist region is deleted to 
a location predetermined on the image. The rotation 5 
angle calculation part 65 determines a rotation angle of 
the hand to be perpendicular to the optical axis. Accord- 
ing to the rotation angle, the region rotation part 66 so 
rotates that the hand is oriented to a certain direction. 
The size normalization part 67 normalizes, in a certain w 
size, the rotated hand region. The finger characteristic 
emphasizing part 68 deletes, from the normalized hand 
region, a predetermined region other than the fingers so 
as to emphasize the characteristics of the fingers. 
[0276] Next, by referring to FIGS. 33 to 35, it is 75 
described stepwise how the hand image normalization 
parts 11 and 21 normalize the hand image. FIG. 33 
exemplarily shows the structure of a storage table pro- 
vided in the color distribution storage part 61 in FIG. 32'. 
Note that, the storage table in FIG. 33 is an exemplary 20 
3D look-up table (LUT) in RGB color space. FIG. 34 
shows the outline of the processing carried out by the 
rotation angle calculation part 65 in FIG. 32. FIG. 35 
exemplarily shows the processing carried out by the fin- 
ger characteristic emphasizing part 68 in FIG. 32. 25 
[0277] First of all, to the color distribution part 61 , a 
beige region necessary to extract the hand region from 
the image with the non-artificial background is set. The 
color distribution storage part 61 is provided with the 3D 
LUT in RGB space as shown in FIG. 33. In order to 30 
obtain the 3D LUT, a 3D color space CS structured by 
three color values (axis) R, G, B each taking a discrete 
value is divided into divided spaces DS according to a 
width of d1, d2, or d3 depending which axis. The 3D 
LUT accordingly stores data values each corresponding 35 
to the color at a barycenter (lattice point) in the divided 
space DS. In other words, the 3D LUT is a table which 
stores a value c {= f(r,g,b)} of a function having the 3D 
coordinates (r,g,b) at each lattice point as a parameter. 
[0278] In the tenth embodiment, supposedly, in the 40 
color distribution storage part 61, any color region for 
the hand, i.e., any region in beige is set to a positive 
value, and other color regions are set to "0". 
[0279] First, the hand region extraction part 62 
scans the input image, and compares the color of the 45 
pixel thereof with the color at each of the lattice points in 
the 3D LUT stored in the color distribution storage part 
61 . Thereafter, the hand region extraction part 62 calcu- 
lates a data value of the lattice point located closest 
thereto. When the color of the pixel is beige, a positive so 
value is outputted, and otherwise "0". In this manner, 
the beige regions can be extracted. Herein, it may also 
be effective if a value obtained through interpolation 
operation subjected to six lattice points in the vicinity of 
the selected lattice point is defined as the above- 55 
described function f. The hand region extraction part 62 
selects, among the extracted beige regions, the region 
whose size is the closest to the hand, and deletes the 



rest of the regions in the hand image as noise. The hand 
image obtained thereby is outputted to the wrist region 
deletion part 63. 

[0280] Herein, the technique for setting the beige 
region to the color distribution storage part 61 includes, 
other than the above-described, a technique for setting 
a constant (e.g., 255 bits) to the beige region (in this 
case, the image comes from the hand region extraction 
part 62 is a silhouette image), a technique for using the 
3D LUT in which a shadowed area in the beige region is 
set to a value indicating darkness, and a highly- 
reflected area therein is set to a value indicating bright- 
ness, or a technique for setting a color distribution of the 
hand image directly to the 3D LUT without any change. 
[0281] Thereafter, the wrist region deletion part 63 
finds which direction the wrist is oriented in the hand 
image extracted by the hand region extraction part 62, 
and then deletes the wrist region according thereto. 
This can be done in a similar manner to the method 
shown in FIG. 2. The region displacement part 64 
receives the hand image from which the wrist region is 
deleted, and then displaces the hand image in such a 
manner that a barycenter of the hand region coincides 
with the center of the hand image. The rotation angle 
calculation part 65 calculates an angle, as shown in 
FIG. 34, between a moment principal axis (a direction to 
which the hand is oriented, i.e. palm principal axis) in 
the hand region and a certain axis (e.g., x axis) in the 
image. 

[0282] Assuming that the hand image is f(x, y), and 
a barycenter coordinates of the hand is (x g , y g ), M 11f 
M 20 . and M 02 are each obtained by the following equa- 
tion (5). 

m pq = EL (x-*g) p (y-y 9 ) 9 1 (*.y) m 

x y 



[0283] Consequently, an angle 6 between the 
moment principal axis and the x axis can be obtained by 
the following equation (6). 

1 -1 / 2 M ii \ 
e = 1 tan 1 [ u 11— ) (6) 

[0284] After the angle calculation, the region rota- 
tion part 66 rotates the hand region in such a manner 
that the direction of the moment principal axis coincides 
with that of the y axis. The size normalization part 67 
then normalizes, in a predetermined size, the rotated 
hand region. 

[0285] These wrist region deletion part 63, the 
region displacement part 64, the rotation angle calcula- 
tion part 65, the region rotation part 66, and the size 
normalization part 67 are the constituents of the hand 
image normalization parts 1 1 and 21 in the first to sev- 



32 



63 



EP 1 059 608 A2 



64 



enth embodiments. In the tenth embodiment, for the 
image recognition with higher accuracy, the finger char- 
acteristic emphasizing part 68 deletes a predetermined 
region other than the fingers from the normalized hand 
region so as to emphasize the characteristics of the fin- 5 
gers. Hereinafter, by referring to FIG. 35, it is exempla- 
rily described how the finger characteristic emphasizing 
part 68 is operated to process. 

[0286] In [Example 1 ] in FIG . 35, the finger region is 
emphasized by deleting, from the hand image, a fan- w 
shaped region having ± A degrees in a -y direction 
(direction of the moment principal axis to which the wrist 
is oriented). Two sides of the fan-shaped region is 
extending from the barycenter in the hand region (i.e., a 
center of the image). In [Example 2], the finger region is 15 
emphasized by first forming a fan-shaped region whose 
side is a distance D in the -y direction in the hand image. 
Then, a part locating further therefrom is deleted. In 
[Example 3], the finger region is emphasized by simply 
deleting a region separated by a line horizontally drawn 20 
from a point having a predetermined length from the 
wrist side. In [Example 4], the finger region is empha- 
sized by subjecting the hand fmage to po/ar-coordfnates 
conversion. 

[0287] As is known from the above, according to the 25 
device and the method for recognizing hand shape and 
position of the tenth embodiment, when normalizing the 
hand image, in addition to the deletion of the wrist 
region, the hand region is extracted based on color 
(beige), or the characteristics of the fingers are empha- 30 
sized after normalization. In this manner, the hand can 
be photographed with a non-artificial background, and 
from the image taken in thereby, the hand shape and 
position can be recognized with higher accuracy. 

35 

(Eleventh Embodiment) 

[0288] An eleventh embodiment of the present 
invention provides a technique for recognizing hand 
shape and position even if an image for recognition is 40 
not commonly oriented as other hand shape images. 
This is implemented by normalizing the input hand 
images obtained by several cameras also in hand orien- 
tation. It is applicable to a case, for example, where the 
hand shape image information storage parts 12A to 45 
12C in the first to tenth embodiments are only stored 
with the hand shape images only upon the palm-princi- 
pal axis. 

[Q289] Such technique r'n the eleventh embodiment 
can be realized in a manner that the hand image nor- 50 
malization part 21 in the first to tenth embodiments 
additionally finds which direction the hand is oriented by 
setting the moment principal axis to each of the input 
hand images from the several cameras, and then nor- 
malizes the input hand images also in hand orientation. 55 
[0290] Since the hand image normalization part 21 
in the device of the eleventh embodiment is structurally 
similar to the hand image normalization part 21 in the 



devices in the first to tenth embodiments, no drawing is 
provided therefor. FIG. 36 shows an exemplary concept 
of a technique for normalizing the images after the hand 
orientation is determined by severaf cameras. FIG. 36 
exemplarily shows a case where three cameras are 
used. 

[0291] It is herein assumed that three cameras are 
used to photograph the hand at each location shown in 
FIG. 36. 

[0292] First of ail, the hand image normalization 
part 21 deletes a wrist region from each of the input 
hand images in a similar manner to the above embodi- 
ments. Thereafter, the hand image normalization part 
21 displaces the hand region of the input hand image 
from which the wrist region is deleted to the center of 
the image, and then determines the direction of the 
moment principal axis in the hand region (same manner 
as for the region displacement part 64 and the rotation 
angle calculation part 65 in the tenth embodiment). 
Next, by referring to the moment principal axis, the hand 
image normalization part 21 calculates the direction of 
the principal axis in the 3D space as a vector value, and 
then calculates a conversion matrix in such a manner 
that the principal axis is directed to be perpendicular to 
the optical axis. Thereafter, according to the calculated 
conversion matrix, the hand image normalization part 
21 changes the shape of the input hand images taken in 
from the cameras. Herein, the input hand images may 
be changed in shape by applying a technique under 
general affine transformation. 

[0293] As is known from the above, according to the 
device and the method for recognizing hand shape and 
position of the eleventh embodiment, even in a case 
where the hand shape image information storage parts 
12A to 12C are only stored with the input hand images 
upon the palm-principal axis, an image which is not 
commonly oriented as other hand shape images can be 
defined by hand shape and position. 
[0294] While the invention has been described in 
detail, the foregoing description is in all aspects illustra- 
tive and not restrictive. It is understood that numerous 
other modifications and variations can be devised with- 
out departing from the scope of the invention. 

Claims 

1 . A device for recognizing hand shape and position of 
a hand image obtained by optical read means 
(hereinafter, referred to as input hand image), the 
device comprising: 

first hand image normalization means (11) for 
receiving a plurality of hand images varied in 
hand shape and position, and after a. wrist 
region is respectively deleted therefrom, sub- 
jecting the hand images to normalization in a 
predetermined manner (in hand orientation, 
image size, image contrast) to generate hand 
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shape images; 

hand shape image information storage means 
(12A) for storing said hand shape images 
together with shape information and position 
information about each of the hand shape s 
images; 

eigenspace calculation means (13) for calculat- 
ing an eigenvalue and an eigenvector from 
each of said hand shape images under analy- 
sis based on an eigenspace method; 10 
eigenvector storage means (1 4) for storing said 
eigenvectors; 

first eigenspace projection means (15) for cal- 
culating eigenspace projection coordinates 
respectively for said hand shape images by 75 
projecting the hand shape images onto an 
eigenspace having said eigenvectors as a 
basis, and storing the eigenspace projection 
coordinates into said hand shape image infor- 
mation storage means (12A); 20 
second hand image normalization means (21) 
for receiving said input hand image, and after a 
wrist region is deleted therefrom, normalizing 
the input hand image to generate an input hand 
shape image being equivalent to said hand 25 
shape images; 

second eigenspace projection means (22) for 
calculating eigenspace projection coordinates 
for said input hand shape image by projecting 
the input hand shape image onto the 30 
eigenspace having said eigenvectors as the 
basis; 

hand shape image selection means (23) for 
comparing said eigenspace projection coordi- 
nates calculated by said second eigenspace 35 
projection means (22) with said eigenspace 
projection coordinates stored in said hand 
shape image information storage means (12A), 
and determining which of said hand shape 
images is closest to said input hand shape ao 
image; and 

shape/position output means (24) for obtaining, 
for output, said shape information and said 
position information on said closest hand 
shape image from said hand shape image 45 
information storage means (12A). 

A device for recognizing hand shape and position of 
a hand image obtained by optical read means 
(hereinafter, referred to as input hand image), the so 
device comprising: 

first hand image normalization means (11) for 
receiving a plurality of hand images varied in 
hand shape and position, and after a wrist 55 
region is respectively deleted therefrom, sub- 
jecting the hand images to normalization in a 
predetermined manner (in hand orientation, 



image size, image contrast) to generate hand 
shape images; 

hand shape image information storage means 
(12B) for storing said hand shape images 
together with shape information and position 
information about each of the hand shape 
images; 

eigenspace calculation means (13) for calculat- 
ing an eigenvalue and an eigenvector from 
each of said hand shape images under analy- 
sis based on an eigenspace method; 
eigenvector storage means (14) for storing said 
eigenvectors; 

first eigenspace projection means (15) for cal- 
culating eigenspace projection coordinates 
respectively for said hand shape images by 
projecting the hand shape images onto an 
eigenspace having said eigenvectors as a 
basis, and storing the eigenspace projection 
coordinates into said hand shape image infor- 
mation storage means (12B); 
cluster evaluation means (16, 18) for classify- 
ing, into clusters, said eigenspace projection 
coordinates under cluster evaluation, determin- 
ing which of said hand shape images belongs 
to which cluster for storage into said hand 
shape image information storage means (12B), 
and obtaining statistical information about each 
cluster; 

cluster information storage means (17A, 17B) 
for storing each of said statistical information 
together with the cluster corresponding thereto; 
second hand image normalization means (21) 
for receiving said input hand image, and after a 
wrist region is deleted therefrom, normalizing 
the input hand image to generate an input hand 
shape image being equivalent to said hand 
shape images; 

second eigenspace projection means (22) for 
calculating eigenspace projection coordinates 
for said input hand shape image by projecting 
the input hand shape image onto the 
eigenspace having said eigenvectors as the 
basis; 

maximum likelihood cluster judgement means 
(25) for comparing said eigenspace projection 
coordinates calculated by said second 
eigenspace projection means (22) with each of 
coordinates included in said statistical informa- 
tion stored in said cluster information storage 
means (17A, 17B), and determining which 
cluster is the closest; 

image comparison means (26, 27) for compar- 
ing said hand shape images included in said 
closest cluster with said input hand shape 
image, and determining which of said hand 
shape images is analogous most closely to the 
input hand shape image; and 
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shape/position output means (24) for obtaining, 
for output, said shape information and said 
position information on said most analogous 
hand shape image from said hand shape 
image information storage means (12B). 5 

3. The device for recognizing hand shape and position 
as claimed in claim 2, wherein said image compari- 
son means (26, 27) includes: 

10 

identical shape classification means for classi- 
fying, according to hand shape, said hand 
shape images included in the cluster deter- 
mined by said maximum likelihood cluster 
judgement means (25) into groups before com- 15 
paring the hand shape images with said input 
hand shape image generated by said second 
hand image normalization means (21); 
shape group statistic calculation means for cal- 
culating a statistic representing said groups; 20 
and 

maximum likelihood shape judgement means 
for calculating a distance between said input 
hand shape image and said statistic, and out- 
putting a hand shape included in the closest 25 
group. 

4. The device for recognizing hand shape and position 
as claimed in claim 2, wherein said cluster evalua- 
tion means (18) obtains said hand shape images 30 
and said shape information for each duster from 
said hand shape image information storage means 
(12B), calculates a partial region respectively for 
said hand shape images for discrimination, and 
stores the partial regions into said cluster informa- 35 
tion storage means (17B); and 

said image comparison means (27) compares 
said hand shape images in the cluster deter- 
mined by said maximum likelihood cluster 40 
judgement means (25) with said input hand 
shape image generated by said second hand 
image normalization means (21) only in said 
partial region corresponding to said cluster. 

45 

5. The device for recognizing hand shape and position 
as claimed in claim 2, wherein, when said input 
hand image is plurally provided by photographing a 
hand from several directions, 

50 

said second hand image normalization means 
(21 ) generates said input hand shape image for 
each of said input hand images, 
said second eigenspace projection means (22) 
calculates the eigenspace projection coordi- 55 
nates in the eigenspace respectively for said 
input hand shape images generated by said 
second hand image normalization means (21), 



said maximum likelihood cluster judgement 
means (25) compares each of said eigenspace 
projection coordinates calculated by said sec- 
ond eigenspace projection means (22) with 
said statistical information, and determines 
which cluster is the closest, and 
said image comparison means (26, 27) merges 
said closest clusters determined by said maxi- 
mum likelihood cluster judgement means (25), 
and estimates hand shape and position con- 
sistent to said shape information and said posi- 
tion information about said hand shape images 
in each of the clusters. 

6. A device for recognizing a meaning of successive 
hand images (hereinafter, referred to as hand 
movement image) obtained by optical read means, 
the device comprising: 

first hand image normalization means (11) for 
receiving a plurality of hand images varied in 
hand shape and position, and after a wrist 
region is respectively deleted therefrom, sub- 
jecting the hand images to normalization in a 
predetermined manner (in hand orientation, 
image size, image contrast) to generate hand 
shape images; 

hand shape image information storage means 
(12B, 12C) for storing said hand shape images 
together with shape information and position 
information about each of the hand shape 
images; 

eigenspace calculation means (13) for calculat- 
ing an eigenvalue and an eigenvector from 
each of said hand shape images under analy- 
sis based on an eigenspace method; 
eigenvector storage means ( 1 4) for storing said 
eigenvectors; 

first eigenspace projection means (15) for cal- 
culating eigenspace projection coordinates 
respectively for said hand shape images by 
projecting the hand shape images onto an 
eigenspace having said eigenvectors as a 
basis, and storing the eigenspace projection 
coordinates into said hand shape image infor- 
mation storage means (12B, 12C); 
cluster evaluation means (16) for classifying, 
into clusters, said eigenspace projection coor- 
dinates under cluster evaluation, determining 
which of said hand shape images belongs to 
which cluster for storage into said hand shape 
image information storage means (12B, 12C), 
and obtaining statistical information about each 
cluster; . 

cluster information storage means (17A) for 
storing each of said statistical information 
together with the cluster corresponding thereto; 
hand region detection means (28, 48, 58) for 
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receiving said hand movement image, and 
detecting a hand region respectively from the 
hand images structuring the hand movement 
image; 

hand movement segmentation means (29) for 5 
determining how the hand is moved in each of 
said detected hand regions, and finding any 
change point in hand movement according 
thereto; 

hand image cutting means (30) for cutting an io 
image corresponding to said detected hand 
region respectively from the images including 
the change points; 

second hand image normalization means (21) 
for respectively normalizing one or more hand 75 
images (hereinafter, referred to as hand image 
series) cut from said hand movement image by 
said hand image cutting means (30), after a 
wrist region is each deleted therefrom, and 
generating input hand shape images being 20 
equivalent to said hand shape images; 
second eigenspace projection means (22) for 
calculating eigenspace projection coordinates 
for each of said input hand shape images by 
projecting the input hand shape images onto 25 
the eigenspace having said eigenvectors as 
the basis; 

maximum likelihood cluster judgement means 
(25) for comparing each of said eigenspace 
projection coordinates calculated by said sec- 30 
ond eigenspace projection means (22) with 
said statistical information stored in said cluster 
information storage means (17A), determining 
which cluster is the closest to each of the 
eigenspace projection coordinates, and output- 35 
ting a symbol each specifying the clusters; 
series registration means (31) for registering, in 
series identification dictionary means, the sym- 
bols (hereinafter, referred to symbol series) 
corresponding to said hand image series out- 40 
putted by said maximum likelihood cluster 
judgement means (25) together with a meaning 
of said hand movement image; 
said series identification dictionary means (32) 
for storing the meaning of said hand movement 45 
image and said symbol series corresponding 
thereto; and 

identification operation means (33A, 33B) for 
obtaining, for output, one of the meanings cor- 
responding to said symbol series outputted by 50 
said maximum likelihood cluster judgement 
means (25) from said series identification dic- 
tionary means (32). 

The device for recognizing hand shape and position 55 
as claimed in claim 6, wherein the device further 
comprises: 



comprehensive movement recognition means 
(37) for receiving said hand movement image, 
and outputting a possibility for meaning by 
judging how the hand is moved and where the 
hand is located in the hand movement image; 
and 

restriction condition storage means (38) for 
previously storing a restriction condition for 
restricting, according to the successive hand 
movement, the meaning of said provided hand 
movement image, wherein 
said identification operation means (33B) 
obtains, for output, while taking said restriction 
condition into consideration, a meaning corre- 
sponding to said symbol series outputted by 
said maximum likelihood cluster judgement 
means (25) from said series identification dic- 
tionary means (32). 

8. The device for recognizing hand shape and position 
as claimed in claim 6, wherein said hand region 
detection means (48) includes: 

possible region cutting means (39) for cutting a 
possible hand region from each hand image 
structuring said input hand movement image; 
masking region storage means (40) for storing 
a masking region used to extract only the pos- 
sible hand region from an image of a rectangu- 
lar region; 

hand region image normalization means (41) 
for superimposing said masking region on each 
of the possible hand regions cut from said hand 
movement - image, and normalizing each 
thereof to generate an image equivalent to the 
hand images used to calculate said eigenvec- 
tors; 

hand region eigenspace projection means (22) 
for calculating eigenspace projection coordi- 
nates for said normalized images by projecting 
the images onto the eigenspace having said 
eigenvectors as the basis; 
hand region maximum likelihood cluster judge- 
ment means (25) for comparing each of said 
eigenspace projection coordinates calculated 
by said hand region eigenspace projection 
means (22) with said statistical information 
stored in said cluster information storage 
means (1 7A), determining which cluster is the 
closest to each of the eigenspace projection 
coordinates, and outputting an estimate value 
indicating closeness between each of the sym- 
bols specifying the cluster and a cluster tor ref- 
erence; and 

region determination means (42) for outputting, 
according to said estimation values, position 
information on said possible hand region 
whose said estimation value is the highest and 
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the cluster thereof. 

9. The device for recognizing hand shape and position 
as claimed in claim 7, wherein said hand region 
detection means (48) includes: 

possible region cutting means (39) for cutting a 
possible hand region from each image structur- 
ing said input hand movement image;- 
masking region storage means (40) for storing 
a masking region used to extract only the pos- 
sible hand region from an image of a rectangu- 
lar region; 

hand region image normalization means (41) 
for superimposing said masking region on each 
of the possible hand regions cut from said hand 
movement image, and normalizing each 
thereof to generate an image equivalent to the 
hand images used to calculate said eigenvec- 
tors; 

hand region eigenspace projection means (22) 
for calculating eigenspace projection coordi- 
nates for said normalized images by projecting 
the images onto the eigenspace having said 
eigenvectors as the basis; 
hand region maximum likelihood cluster judge- 
ment means (25) for comparing each of said 
eigenspace projection coordinates calculated 
by said hand region eigenspace projection 
means (22) with said statistical information 
stored in said cluster information storage 
means (17A), determining which cluster is the 
closest to each of the eigenspace projection 
coordinates, and outputting an estimate value 
indicating closeness between each of the sym- 
bols specifying the cluster and a cluster for ref- 
erence; and 

region determination means (42) for outputting, 
according to said estimation values, position 
information on said possible hand region 
whose said estimation value is the highest and 
the cluster thereof. 

10. The device for recognizing hand shape and position 
as claimed in claim 1 , wherein said first hand image 
normalization means (11) and said second hand 
image normalization means (21) respectively 
include: 

color distribution storage means (61) for previ- 
ously storing a color distribution of said hand 
region to be extracted from the input hand 
image; 

hand region extraction means (62) for extract- 
ing said hand region from an input hand image 
according to said color distribution; 
wrist region deletion means (63) for finding 
which direction a wrist is oriented, and deleting 



a wrist region from said hand region according 
to the direction; 

region displacement means (64) for displacing 
said hand region from which said wrist region is 
5 deleted to a predetermined location on the 

image; 

rotation angle calculation means (65) for calcu- 
lating a rotation angle in such a manner that the 
hand in said hand region is oriented to a prede- 

10 termined direction; 

region rotation means (66) for rotating, accord- 
ing to said rotation angle, said hand region in 
such a manner that the hand therein is oriented 
to a direction; and 

75 size normalization means (67) for normalizing 

said rotated hand region to be in a predeter- 
mined size. 

11. The device for recognizing hand shape and position 
20 as claimed in claim 2, wherein said first hand image 

normalization means (11) and said second hand 
image normalization means (21) respectively 
include: 

25 color distribution storage means (61 ) for previ- 

ously storing a color distribution of said hand 
region to be extracted from the input hand 
image; 

hand region extraction means (62) for extract- 
so ing said hand region from an input hand image 
according to said color distribution; 
wrist region deletion means (63) for finding 
which direction a wrist is oriented, and deleting 
a wrist region from said hand region according 
35 to the direction; 

region displacement means (64) for displacing 
said hand region from which said wrist region is 
deleted to a predetermined location on the 
image; 

40 rotation angle calculation means (65) for calcu- 

lating a rotation angle in such a manner that the 
hand in said hand region is oriented to a prede- 
termined direction; 

region rotation means (66) for rotating, accord- 
45 ing to said rotation angle, said hand region in 

such a manner that the hand therein is oriented 
to a direction; and 

size normalization means (67) for normalizing 
said rotated hand region to be in a predeter- 
50 mined size. 

12. The device for recognizing hand shape and position 
as claimed in claim 6, wherein said first hand image 
normalization means (11) and said second hand 

55 image normalization means (21) respectively 
include: 

color distribution storage means (61) for previ- 
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ously storing a color distribution of said hand 
region to be extracted from the input hand 
image; 

hand region extraction means (62) for extract- 
ing said hand region from an input hand image 5 
according to said color distribution; 
wrist region deletion means (63) for finding 
which direction a wrist is oriented, and deleting 
a wrist region from said hand region according 
to the direction; to 
region displacement means (64) for displacing 
said hand region from which said wrist region is 
deleted to a predetermined location on the 
image; 

rotation angle calculation means (65) for calcu- is 
lating a rotation angle in such a manner that the 
hand in said hand region is oriented to a prede- 
termined direction; 

region rotation means (66) for rotating, accord- 
ing to said rotation angle, said hand region in 20 
such a manner that the hand therein is oriented 
to a direction; and 

size normalization means (67) for norma/izing 
said rotated hand region to be in a predeter- 
mined size. 25 

13. The device for recognizing hand shape and position 
as claimed in claim 1 , further comprising: 

instruction storage means for storing an 30 
instruction corresponding respectively to said 
shape information and said position informa- 
tion; and 

instruction output means for receiving said 
shape information and said position informa- 35 
tton provided by said shape/position output 
means, and obtaining, for output, the instruc- 
tion respectively corresponding to the shape 
information and the position information from 
said instruction storage means. 40 

14. A method for recognizing hand shape and position 
of a hand image obtained by optical read means 
(hereinafter, referred to as input hand image), the 
method comprising: .45 

a first normalization step of receiving a plurality 
of hand images varied in hand shape and posi- 
tion, and after a wrist region is respectively 
deleted therefrom, subjecting the hand images so 
to normalization in a predetermined manner (in 
hand orientation, image size, image contrast) 
to generate hand shape images; 
an analysis step of calculating an eigenvalue 
and an eigenvector from each of said hand ss 
shape images under analysis based on an 
eigenspace method; 

a first projection step of calculating eigenspace 



projection coordinates respectively for said 
hand shape images by projecting the hand 
shape images onto an eigenspace having said 
eigenvectors as a basis; 
a second normalization step of receiving said 
input hand image, and after a wrist region is 
deleted therefrom, normalizing the input hand 
image to generate an input hand shape image 
being equivalent to said hand shape images; 
a second project/on step of calculating 
eigenspace projection coordinates for said 
input hand shape image by projecting the input 
hand shape image onto the eigenspace having 
said eigenvectors as the basis; 
a comparison step of comparing said 
eigenspace projection coordinates calculated 
for said hand shape images with said 
eigenspace projection coordinates calculated 
for said input hand shape image, and determin- 
ing which of said hand shape images is closest 
to said input hand shape image; and 
a step of outputting said shape information and 
said position information on said closest hand 
shape image. 

15. A method for recognizing hand shape and position 
of a hand image obtained by optical read means 
(hereinafter, referred to as input hand image), the 
method comprising: 

a first normalization step of receiving a plurality 
of hand images varied in hand shape and posi- 
tion, and after a wrist region is respectively 
deleted therefrom, subjecting the hand images 
to normalization in a predetermined manner (in 
hand orientation, image size, image contrast) 
to generate hand shape images; 
an analysis step of calculating an eigenvalue 
and an eigenvector from each of said hand 
shape images under analysis based on an 
eigenspace method; 

a first projection step of calculating eigenspace 
projection coordinates respectively for said 
hand shape images by projecting the hand 
shape images onto an eigenspace having said 
eigenvectors as a basis; 
an evaluation step of classifying, under cluster 
evaluation, said eigenspace projection coordi- 
nates into clusters, determining which of said 
hand shape images belongs to which cluster, 
and obtaining statistical information about each 
of the clusters; 

a second normalization step of receiving said 
input hand image, and after a wrist region is 
deleted therefrom, normalizing the input hand 
image to generate an input hand shape image 
being equivalent to said hand shape images; 
a second projection step of calculating 



38 



75 



EP 1 059 608 A2 



76 



eigenspace projection coordinates for said 
input hand shape image by projecting the input 
hand shape image onto the eigenspace having 
said eigenvectors as the basis; 
a judgement step of comparing said 5 
eigenspace projection coordinates calculated 
for said input hand shape image with each of 
said statistical information, and determining the 
closest cluster; 

a comparison step of comparing each of said w 
hand shape images included in said closest 
cluster with said input hand shape image, and 
determining which of said hand shape images 
is most analogous to the input hand shape 
image, and is 
a step of outputting said shape information and 
said position information on said most analo- 
gous hand shape image. 

16. The method for recognizing hand shape and posi- 20 
tion as claimed in claim 15, wherein said compari- 
son step includes, 

a step of classifying, into clusters, said hand 
shape images included in the cluster deter- 25 
mined in said judgement step before compar- 
ing the hand shape images with said input 
hand shape image generated in said second 
normalization step; 

a step of calculating a statistic representing 30 
said clusters; and 

a step of calculating a distance between said 
input hand shape image and said statistic, and 
outputting a hand shape included in the closest 
cluster. 35 

17. The method for recognizing hand shape and. posi- 
tion as claimed in claim 15, wherein, in said evalua- 
tion step, according to said hand shape images and 
said shape information, a. partial region is calcu- 40 
lated respectively for said hand shape images for 
discrimination, and 

in said comparison step, said hand shape 
images in the cluster determined in said judge- 45 
ment step are compared with said input hand 
shape image generated in said second normal- 
ization step only in said partial region corre- 
sponding to said cluster. 

so 

18. The method for recognizing hand shape and posi- 
tion as claimed in claim 15, wherein, when said 
input hand image is plurally provided by photo- 
graphing a hand from several directions, 

55 

in said second normalization step, said input 
hand shape image is generated for each of 
said input hand images, 



in said second projection step, eigenspace pro- 
jection coordinates in the eigenspace is calcu- 
lated respectively for said input hand shape 
images generated in said second normaliza- 
tion step, 

in said judgement step, each of said 
eigenspace projection coordinates calculated 
in said second projection step is compared with 
said statistical information, and the closest 
cluster is determined, and 
in said comparison step, said closest clusters 
determined in said judgement step are 
merged, and hand shape and position consist- 
ent to said shape information and said position 
information about said hand shape images in 
each of the clusters is estimated. 

19. A method for recognizing a meaning of successive 
hand images (hereinafter, referred to as hand 
movement image) obtained by optical read means, 
the device comprising: 

a first normalization step of receiving a plurality 
of hand images varied in hand shape and posi- 
tion, and after a wrist region is respectively 
deleted therefrom, subjecting the hand images 
to normalization in a predetermined manner (in 
hand orientation, image size, image contrast) 
to generate hand shape images; 
an analysis step of calculating an eigenvalue 
and an eigenvector from each of said hand 
shape images under analysis based on an 
eigenspace method; 

a first projection step of calculating eigenspace 
projection coordinates respectively for said 
hand shape images by projecting the hand 
shape images onto an eigenspace having said 
eigenvectors as a basis; 
an evaluation step of classifying, into clusters, 
said eigenspace projection coordinates under 
cluster evaluation, determining which of said 
hand shape images belongs to which cluster, 
and obtaining statistical information about each 
cluster; 

a detection step of receiving said hand move- 
ment image, and detecting a hand region 
respectively from the hand images structuring 
the hand movement image; 
a segmentation step of determining how the 
hand is moved in each of said detected hand 
regions, and finding any change point in hand 
movement according thereto; 
a cutting step of cutting an image correspond- 
ing to said detected hand region respectively 
from the images including the change points; 
a second normalization step of respectively 
normalizing one or more hand images (herein- 
after, referred to as hand image series) cut 
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from said hand movement image, after a wrist 
region is each deleted therefrom, and generat- 
ing input hand shape images being equivalent 
to said hand shape images; 
a second projection step of calculating 5 
eigenspace projection coordinates for each of 
said input hand shape images by projecting the 
input hand shape images onto the eigenspace 
having said eigenvectors as the basis; 
a judgement step of comparing each of said io 
eigenspace projection coordinates calculated 
for said input hand shape images with said sta- 
tistical information, determining which cluster is 
the closest, and outputting a symbol each 
specifying the clusters; is 
a step of storing the symbols (hereinafter, 
referred to symbol series) corresponding to 
said judged hand image series together with a 
meaning of said hand movement image; and 
an identification step of outputting, in order to 20 
identify said hand movement image, a meaning 
corresponding to said judged symbol series 
based on said stored symbol series and mean- 
ing. 

25 

20. The method for recognizing hand shape and posi- 
tion as claimed in claim 1 9, wherein the method fur- 
ther comprises: 

a recognition step of receiving said hand move- 30 
ment image, and outputting a possibility for 
meaning by judging how the hand is moved 
and where the hand is located in the hand 
movement image; and 

a storage step of previously storing a restriction 35 
condition for restricting, according to the suc- 
cessive hand movement, the meaning of said 
provided hand movement image, wherein 
said identification step of outputting, while tak- 
ing said restriction condition into consideration, 40 
a meaning corresponding to said judged sym- 
bol series based on said stored symbol series 
and meaning. 

21. The method for recognizing hand shape and posi- 45 
tion as claimed in claim 1 9, wherein said detection 
step includes: 

a cutting step of cutting a possible hand region 

from each hand image structuring said input so 

hand movement image; 

a storage step of storing a masking region used 

to extract only the possible hand region from an 

image of a rectangular region; 

a normalization step of superimposing said 55 

masking region on each of the possible hand 

regions cut from said hand movement image, 

and normalizing each thereof to generate an 



image equivalent to the hand images used to 
calculate said eigenvectors; 
a projection step of calculating eigenspace pro- 
jection coordinates for said normalized images 
by projecting the images onto the eigenspace 
having said eigenvectors as the basis; 
a judgement step of comparing each of said 
eigenspace projection coordinates with said 
statistical information, determining which clus- 
ter is the closest, and outputting an estimate 
value indicating closeness between each of the 
symbols specifying the cluster and a cluster for 
reference; and 

a determination step of outputting, according to 
said estimation values, position information on 
said possible hand region whose said estima- 
tion value is the highest and the cluster thereof. 

22. The method for recognizing hand shape and posi- 
tion as claimed in claim 20, wherein said detection 
step includes: 

a cutting step of cutting a possible hand region 
from each hand image structuring said input 
hand movement image; 

a storage step of storing a masking region used 
to extract only the possible hand region from an 
image of a rectangular region; 
a normalization step of superimposing said 
masking region on each of the possible hand 
regions cut from said hand movement image, 
and normalizing each thereof to generate an 
image equivalent to the hand images used to 
calculate said eigenvectors; 
a projection step of calculating eigenspace pro- 
jection coordinates for said normalized images 
by projecting the images onto the eigenspace 
having said eigenvectors as the basis; 
a judgement step of comparing each of said 
eigenspace projection coordinates with said 
statistical information, determining which clus- 
ter is the closest, and outputting an estimate, 
value indicating closeness between each of the 
symbols specifying the cluster and a cluster for 
reference; and 

a determination step of outputting, according to 
said estimation values, position information on 
said possible hand region whose said estima- 
tion value is the highest and the cluster thereof. 

23. The method for recognizing hand shape and posi- 
tion as claimed in claim 15, wherein said first nor- 
malization step and said second normalization step 
respectively include: 

a color storage step of previously storing a 
color distribution of said hand region to be 
extracted from the input hand image; 
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a step of displacing said hand region from 
which said wrist region is deleted to a predeter- 
mined location on the image; 
a step of calculating a rotation angle in such a 

5 manner that the hand in said hand region is ori- 

ented to a predetermined direction; 
a step of rotating, according to said rotation 
angle, said hand region in such a manner that 
the hand therein is oriented to a direction; and 

io a step of normalizing said rotated hand region 

to be in a predetermined size. 

26. The method for recognizing hand shape and posi- 
tion as claimed in claim 14, further comprising: 

15 

an instruction storage step of storing an 
instruction corresponding respectively to said 
shape information and said position informa- 
tion; and 

20 a step of receiving said shape information and 

said position information outputted in said out- 
put step, and obtaining, for output, the instruc- 
tion respectively corresponding to the shape 
information and the position information stored 

25 in said instruction storage step. 



a step of extracting said hand region from an 
input hand image according to said color distri- 
bution; 

a step of finding which direction a wrist is ori- 
ented, and deleting a wrist region from said 
hand region according to the direction; 
a step of displacing said hand region from 
which said wrist region is deleted to a predeter- 
mined location on the image; 
a step of calculating a rotation angle in such a 
manner that the hand in said hand region is ori- 
ented to a predetermined direction; 
a step of rotating, according to said rotation 
angle, said hand region in such a manner that 
the hand therein is oriented to a direction; and 
a step of normalizing said rotated hand region 
to be in a predetermined size. 

24. The method for recognizing hand shape and posi- 
tion as claimed in claim 16, wherein said first nor- 
malization step and said second normalization step 
respectively include: 

a color storage step of previously storing a 
color distribution of said hand region to be 
extracted from the input hand image; 
a step of extracting said hand region from an 
input hand image according to said color distri- 
bution; 

a step of finding which direction a wrist is ori- 
ented, and deleting a wrist region from said 
hand region according to the direction; 
a step of displacing said hand region from 
which said wrist region is deleted to a predeter- 
mined location on the image; 
a step of calculating a rotation angle in such a 
manner that the hand in said hand region is ori- 
ented to a predetermined direction; 
a step of rotating, according to said rotation 
angle, said hand region in such a manner that 
the hand therein is oriented to a direction; and 
a step of normalizing said rotated hand region 
to be in a predetermined size. 

25. The method for recognizing hand shape and posi- 
tion as claimed in claim 20, wherein said first nor- 
malization step and said second normalization step 
respectively include: 

a color storage step of previously storing a 
color distribution of said hand region to be 
extracted from the input hand image; 
a step of extracting said hand region from an 
input hand image according to said color distri- 
bution; 

a step of finding which direction a wrist is ori- 
ented, and deleting a wrist region from said 
hand region according to the direction; 



27. A recording medium being stored a program to be 
executed on a computer device for carrying out a 
method for recognizing hand shape and position of 
30 a hand image obtained by optical read means 
(hereinafter, referred to as input hand image), the 
program being for realizing an operational environ- 
ment on the computer device including: 



35 a first normalization step ot receiving a plurality 

of hand images varied in hand shape and posi- 
tion, and after a wrist region is respectively 
deleted therefrom, subjecting the hand images 
to normalization in a predetermined manner (in 

40 hand orientation, image size, image contrast) 

to generate hand shape images; 
an analysis step of calculating an eigenvalue 
and an eigenvector from each of said hand 
shape images under analysis based on an 

45 eigenspace method; 

a first projection step of calculating eigenspace 
projection coordinates respectively for said 
hand shape images by projecting the hand 
shape images onto an eigenspace having said 

so eigenvectors as a basis; 

a second normalization step of receiving said 
input hand image, and after a wrist region is 
deleted therefrom, normalizing the input hand 
image to generate an input hand shape image 

55 being equivalent to said hand shape images; 

a second projection step of calculating 
eigenspace projection coordinates for said 
input hand shape image by projecting the input 
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hand shape image onto the eigenspace having 
said eigenvectors as the basis; 
a comparison step of comparing said 
eigenspace projection coordinates calculated 
for said hand shape images with said 
eigenspace projection coordinates calculated 
for said input hand shape image, and determin- 
ing which of said hand shape images is closest 
to said input hand shape image; and 
a step of outputting said shape information and 
said position information on said closest hand 
shape image. 

28. A recording medium being stored a program to be 
executed on a computer device for carrying out a 
method for recognizing hand shape and position of 
a hand image obtained by optical read means 
(hereinafter, referred to as input hand image), the 
program being for realizing an operational environ- 
ment on the computer device including: 

a first normalization step of receiving a plurality 
of hand images varied in hand shape and posi- 
tion, and after a wrist region is respectively 
deleted therefrom, subjecting the hand images 
to normalization in a predetermined manner (in 
hand orientation, image size, image contrast) 
to generate hand shape images; 
an analysis step of calculating an eigenvalue 
and an eigenvector from each of said hand 
shape images under analysis based on an 
eigenspace method; 

a first projection step of calculating eigenspace 
projection coordinates respectively for said 
hand shape images by projecting the hand 
shape images onto an eigenspace having said 
eigenvectors as a basis; 
an evaluation step of classifying, into clusters, 
said eigenspace projection coordinates under 
cluster evaluation, determining which of said 
hand shape images belongs to which cluster, 
and obtaining statistical information about each 
cluster; 

a second normalization step of receiving said 
input hand image, and after a wrist region is 
deleted therefrom, normalizing the input hand 
image to generate an input hand shape image 
being equivalent to said hand shape images; 
a second projection step of calculating 
eigenspace projection coordinates for said 
input hand shape image by projecting the input 
hand shape image onto the eigenspace having 
said eigenvectors as the basis; 
a judgement step of comparing said 
eigenspace projection coordinates calculated 
for said input hand shape image with each of 
coordinates included in said statistical informa- 
tion, and determining which cluster is the clos- 



est; 

a comparison step of comparing said hand 
shape images included in said closest cluster 
with said input hand shape image, and deter- 
5 mining which of said hand shape images is 

analogous most closely to the input hand 
shape image; and 

a step of outputting said shape information and 
said position information on said most analo- 
io gous hand shape image. 

29. The recording medium as claimed in 28, wherein 
said comparison step includes: 

is a step of classifying, into clusters, said hand 

shape images included in the cluster deter- 
mined in said judgement step before compar- 
ing the hand shape images with said input 
hand shape image generated in said second 

20 normalization step; 

a step of calculating' a statistic representing 
said clusters; and 

a step of calculating a distance between said 
input hand shape image and said statistic, and 
25 outputting a hand shape included in the closest 

cluster. 

30. The recording medium as claimed in claim 28, 
. wherein, in said evaluation step, according to said 

30 hand shape images and said shape information, a 
partial region is calculated respectively for said 
hand shape images for discrimination, and 

in said comparison step, said hand shape 
35 images in the cluster determined in said judge- 

ment step are compared with said input hand 
shape image generated in said second normal- 
ization step only in said partial region corre- 
sponding to said cluster. 

•40 

31. The recording medium as claimed in claim 28, 
wherein, when said input hand image is piurally 
provided by photographing a hand from several 
directions, 

45 

in said second normalization step, said input 
hand shape image is generated for each of 
said input hand images, 

in said second projection step, eigenspace pro- 
50 jection coordinates in the eigenspace is calcu- 

lated respectively for said input hand shape 
images generated in said second normaliza- 
tion step, 

in said judgement step, each of said 
55 eigenspace projection coordinates calculated 

in said second projection step is compared with 
said statistical information, and the closest 
cluster is determined, and 
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in said comparison step, said closest clusters 
determined in said judgement step are 
merged, and hand shape and position consist- 
ent to said shape information and said position 
information about said hand shape images in 5 
each of the clusters is estimated. 

32. A recording medium being stored a program to be 
executed on a computer device for carrying out a 
method for recognizing a meaning of successive 10 
hand images (hereinafter, referred to as hand 
movement image) obtained by optical read means, 
the program being for realizing an operational envi- 
ronment on the computer device including: 

15 

a first normalization step of receiving a plurality 
of hand images varied in hand shape and posi- 
tion, and after a wrist region is respectively 
deleted therefrom, subjecting the hand images 
to normalization in a predetermined manner (in 20 
hand orientation, image size, image contrast) 
to generate hand shape images; 
an analysis step of calculating an eigenvalue 
and an eigenvector from each of said hand 
shape images under analysis based on an 25 
eigenspace method; 

a first projection step of calculating eigenspace 
projection coordinates respectively for said 
hand shape images by projecting the hand 
shape images onto an eigenspace having said 30 
eigenvectors as a basis; 
an evaluation step of classifying, into clusters, 
said eigenspace projection coordinates under 
cluster evaluation, determining which of said 
hand shape images belongs to which cluster, 35 
and obtaining statistical information about each 
cluster; 

a detection step of receiving said hand move- 
ment image, and detecting a hand region 
respectively from the hand images structuring 40 
the hand movement image; 
a segmentation step of determining how the 
hand is moved in each of said detected hand 
regions, and finding any change point in hand 
movement according thereto; 45 
a cutting step of cutting an image correspond- 
ing to said detected hand region respectively 
from the images including the change points; 
a second normalization step of respectively 
normalizing one or more hand images (herein- so 
after, referred to as hand image series) cut 
from said hand movement image, after a wrist 
region is each deleted therefrom, and generat- 
ing input hand shape images being equivalent 
to said hand shape images; 55 
a second projection step of calculating 
eigenspace projection coordinates for each of 
said input hand shape images by projecting the 



input hand shape images onto the eigenspace 
having said eigenvectors as the basis; 
a judgement step of comparing each of said 
eigenspace projection coordinates calculated 
for said input hand shape images with said sta- 
tistical information, determining which cluster is 
the closest, and outputting a symbol each 
specifying the clusters; 

a step of storing the symbols (hereinafter, 
referred to symbol series) corresponding to 
said judged hand image series together with a 
meaning of said hand movement image; and 
an identification step of outputting, in order to 
identify said hand movement image, a meaning 
corresponding to said judged symbol series 
based on said stored symbol series and mean- 
ing. 

33. The recording medium as claimed in claim 32, fur- 
ther comprising: 

a recognition step of receiving said hand move- 
ment image, and outputting a possibility for 
meaning by judging how the hand is moved 
and where the hand is located in the hand 
movement image; and 

a storage step of previously storing a restriction 
condition for restricting, according to the suc- 
cessive hand movement, the meaning of said 
provided hand movement image, wherein 
said identification step of outputting, while tak- 
ing said restriction condition into consideration, 
a meaning corresponding to said judged sym- 
bol series based on said stored symbol series 
and meaning. 

34. The recording medium as claimed in claim 32, 
wherein said detection step includes: 

a cutting step of cutting a possible hand region 
from each hand image structuring said input 
hand movement image; 
a storage step of storing a masking region used 
to extract only the possible hand region from an 
image of a rectangular region; 
a normalization step of superimposing said 
masking region on each of the possible hand 
regions cut from said hand movement image, 
and normalizing each thereof to generate an 
image equivalent to the hand images used to 
calculate said eigenvectors; 
a projection step of calculating eigenspace pro- 
jection coordinates for said normalized images 
by projecting the images onto the eigenspace 
having said eigenvectors as the basis; 
a judgement step of comparing each of said 
eigenspace projection coordinates with said 
statistical information, determining which clus- 
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ter is the closest, and outputting an estimate 
value indicating closeness between each of the 
symbols specifying the cluster and a cluster for 
reference; and 

a determination step of outputting, according to 
said estimation values, position information on 
said possible hand region whose said estima- 
tion value is the highest and the cluster thereof. 

35. The recording medium as claimed in claim 33, 
wherein said detection step includes: 

a cutting step of cutting a possible hand region 
from each hand image structuring said input 
hand movement image; 

a storage step of storing a masking region used 
to extract only the possible hand region from an 
image of a rectangular region; 
a normalization step of superimposing said 
masking region on each of the possible hand 
regions cut from said hand movement image, 
and normalizing each thereof to generate an 
image equivalent to the hand images used to 
calculate said eigenvectors; 
a projection step of calculating eigenspace pro- 
jection coordinates for said normalized images 
by projecting the images onto the eigenspace 
having said eigenvectors as the basis; 
a judgement step of comparing each of said 
eigenspace projection coordinates with said 
statistical information, determining which clus- 
ter is the closest, and outputting an estimate 
value indicating closeness between each of the 
symbols specifying the cluster and a cluster for 
reference; and 

a determination step of outputting, according to 
said estimation values, position information on 
said possible hand region whose said estima- 
tion value is the highest and the cluster thereof. 

36. The recording medium as claimed in claim 28, 
wherein said first normalization step and said sec- 
ond normalization step respectively include: 

a color storage step of previously storing a 
color distribution of said hand region to be 
extracted from the input hand image; 
a step of extracting said hand region from an 
input hand image according to said color distri- 
bution; 

a step of finding which direction a wrist is ori- 
ented, and deleting a wrist region from said 
hand region according to the direction; 
a step of displacing said hand region from 
which said wrist region is deleted to a predeter- 
mined location on the image; 
a step of calculating a rotation angle in such a 
manner that the hand in said hand region is ori- 



ented to a predetermined direction; 
a step of rotating, according to said rotation 
angle, said hand region in such a manner that 
the hand therein is oriented to a direction; and 
5 a step of normalizing said rotated hand region 

to be in a predetermined size. 

37. The recording medium as claimed in claim 29, 
wherein said first normalization step and said sec- 

io ond normalization step respectively include: 

a color storage step of previously storing a 
color distribution of said hand region to be 
extracted from the input hand image; 
is a step of extracting said hand region from an 

input hand image according to said color distri- 
bution; 

a step of finding which direction a wrist is ori- 
ented, and deleting a wrist region from said 

20 hand region according to the direction; 

a step of displacing said hand region from 
which said wrist region is deleted to a predeter- 
mined location on the image; 
a step of calculating a rotation angle in such a 

25 • manner that the hand in said hand region is ori- 

ented to a predetermined direction; 
a step of rotating, according to said rotation 
angle, said hand region in such a manner that 
the hand therein is oriented to a direction; and 

30 a step of normalizing said rotated hand region 

to be in a predetermined size. 

38. The recording medium as claimed in claim 33, 
wherein said first normalization step and said sec- 

35 ond normalization step respectively include: 

a color storage step of previously storing a 
color distribution of said hand region to be 
extracted from the input hand image; 
40 a step of extracting said hand region from an 

input hand image according to said color distri- 
bution; 

a step of finding which direction a wrist is ori- 
ented, and deleting a wrist region from said 

45 hand region according to the direction; 

a step of displacing said hand region from 
which said wrist region is deleted to a predeter- 
mined location on the image; 
a step of calculating a rotation angle in such a 

so manner that the hand in said hand region is ori- 

ented to a predetermined direction; 
a step of rotating, according to said rotation 
angle, said hand region in such a manner that 
the hand therein is oriented to a direction; and 

55 a step of normalizing said rotated hand region 

to be in a predetermined size. 

39. The recording medium claimed in claim 27, further 
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comprising: 

an instruction storage step of storing an 
instruction corresponding respectively to said 
shape information and said position informa- 5 
tion; and 

a step of receiving said shape information and 
said position information outputted in said out- 
put step, and obtaining, for output, the instruc- 
tion respectively corresponding to the shape 10 
information and the position information stored 
in said instruction storage step. 
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FIG. 19 
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